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ABSTRACT 


Maritime  Interdiction  Missions  (MIM)  are  of  great  interest  and  high  operational 
importance  to  the  U.S.  Navy,  the  U.S.  Coast  Guard,  and  allied  forces.  The  MIM  scenario 
discussed  in  this  thesis  includes  an  area  of  interest  with  multiple  neutral  and  hostile 
vessels  moving  through  this  area,  and  an  interdiction  force  consisting  of  an  unmanned 
aerial  vehicle  (UAV)  and  an  intercepting  vessel,  whose  objectives  are  to  search,  identify, 
and  intercept  hostile  vessels  within  a  given  time  frame.  In  this  thesis,  we  develop 
Stochastic  Dynamic  Programming  models,  which  represent  the  MIM  scenario.  While  a 
theoretical  method  of  producing  an  optimal  decision  policy  for  the  interdiction  force  is 
presented  in  this  thesis,  it  is  shown  that  such  computation  is  intractable.  The  models 
developed  in  this  study  are  used  to  analyze  and  evaluate  the  performance  of  a  heuristic 
decision  policy  that  we  recommend  to  be  applied  by  the  interdiction  force.  Based  on  a 
numerical  case  study,  which  includes  several  representative  MIM  scenarios,  we  show  that 
the  number  of  intercepted  hostile  vessels  following  the  heuristic  decision  policy  is  at  least 
60%  of  the  number  of  hostile  vessels  intercepted  following  the  optimal  decision  policy. 
Based  on  the  results  of  the  heuristic  performance  in  the  numerical  case  studies,  we 
recommend  the  implementation  of  our  suggested  heuristic  in  an  operational  decision  aid 
for  Maritime  Interdiction  Missions. 
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EXECUTIVE  SUMMARY 


Maritime  Interdiction  Missions  (MIM)  are  of  great  interest  and  high  operational 
importance  to  the  U.S.  Navy,  the  U.S.  Coast  Guard,  and  allied  forces.  The  MIM  scenario 
discussed  in  this  thesis  includes  an  area  of  interest  (AO I)  with  multiple  neutral  and  hostile 
vessels  moving  through  this  area,  and  an  interdiction  force  consisting  of  an  unmanned 
aerial  vehicle  (UAV)  and  an  intercepting  vessel,  whose  objectives  are  to  search,  identify, 
and  intercept  hostile  vessels  within  a  given  time  frame.  The  goal  of  this  thesis  is  to 
optimize  the  operational  policy  used  in  employment  of  such  interdiction  force.  We 
discuss  why  finding  the  optimal  decision  policy  for  the  interdiction  force  is  intractable, 
while  suggesting  a  sub-optimal  yet  effective  and  practical  heuristic  decision  policy, 
which  performance  does  not  fall  by  much  behind  the  optimal  policy. 


Scenario  and  Concept  of  Operations 

The  MIM  scenario  discussed  in  this  thesis  includes  multiple  moving  neutral  and 
hostile  vessels  and  an  interdiction  force  comprising  two  assets:  a  UAV  that  detects 
identifies,  and  tracks  suspected  vessels  and  a  navy  vessel  that  intercepts  suspected  vessels 
indicated  by  the  UAV.  The  goal  of  the  interdiction  force  is  to  intercept  as  many  hostile 
vessels  as  possible  in  a  given  time  frame.  We  assume  the  existence  of  some  prior 
intelligence  regarding  the  expected  movement  patterns  of  both  neutral  and  hostile  vessels. 
The  UAV  moves  around  the  AOI  while  searching  for  vessels.  Once  a  vessel  is  detected, 
the  UAV  tracks  that  vessel  for  a  given  time  period,  while  attempting  to  identify  that 
vessel  as  either  a  neutral  or  a  hostile  one.  The  UAV  identification  is  accomplished  by 
utilizing  a  sensor  (e.g.,  electro-optical  or  radar  sensors)  to  recognize  the  physical 
signature  of  that  vessel  and  its  movement  pattern.  Once  the  UAV  has  flagged  a  vessel  as 
a  suspicious  one,  the  interception  vessel  is  called  in  to  physically  intercept  that  suspicious 
vessel. 


XIX 


Methodology  and  Models  used 


We  developed  a  Stoehastie  Dynamie  Programming  model,  whieh  represents  a 
system  of  neutral  and  hostile  vessels  together  with  an  interdietion  foree  operating  inside 
the  AOI.  This  model  is  used  to  optimize  the  operational  poliey  of  the  interdietion  foree  in 
this  MIM  seenario.  The  model  ineludes  the  states  of  the  system,  the  deeisions  the 
interdietion  foree  is  required  to  make,  the  information  obtained  by  the  interdietion  foree 
following  sueh  a  deeision,  the  manner  by  whieh  the  system  evolves  over  time  aeeording 
to  the  deeision  poliey  being  used  and  the  overall  objeetive  funetion. 

While  this  model  can  theoretically  be  used  to  find  the  optimal  decision  policy  for 
the  interdiction  force,  we  show  why  this  problem  is  intractable  and  cannot  be 
implemented  in  a  real-world  operational  scenario.  Instead,  we  suggest  a  heuristic 
approach  to  constructing  an  effective  sub-optimal  decision  policy  and  present  its 
performance  in  achieving  the  interdiction  force’s  goal.  The  analysis  of  this  heuristic 
decision  policy  is  accomplished  by  means  of  bounding  its  expected  performance  based  on 
a  simplified  dynamic  programming  model  of  the  MIM  scenario  at  hand. 

Both  of  the  above  models — the  original  dynamic  programming  model  and  the 
simplified  one — have  been  implemented  in  MATLAB  with  the  aim  of  analyzing  a 
numerical  case  study  of  several  representative  MIM  scenarios.  This  numerical  case  study 
is  used  to  evaluate  the  performance  of  the  heuristic  decision  policy. 


Results 

The  analysis  of  several  MIM  scenarios  as  part  of  the  numerical  case  study,  along 
with  a  sensitivity  analysis  and  parametric  study,  has  shown  that  the  number  of  intercepted 
hostile  vessels  following  the  heuristic  decision  policy  is  at  least  60%  of  the  number  of 
hostile  vessels  intercepted  following  the  optimal  decision  policy  (which  is  intractable  to 
produce).  Furthermore,  we  present  an  approximate  optimization  for  one  of  the  key 
operational  parameters  in  the  employment  of  the  interdiction  force:  How  often  should  one 
call  the  interception  vessel  to  intercept  suspicious  vessels  being  tracked  by  the  UAV.  It 
has  been  found  that  when  the  boarding  time  of  an  intercepted  vessel  is  negligible,  it  is 
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best  to  call  the  Interceptor  even  when  there  is  only  a  small  probability  that  the  tracked 
vessel  is  an  hostile  one,  while  when  the  boarding  time  is  in  the  order  of  an  hour,  it  is  best 
to  call  the  Interceptor  only  when  there  is  at  least  40%  chance  that  the  tracked  vessel  is  a 
hostile  one. 


Conclusions 

Based  on  the  results  of  the  heuristic  performance  in  the  numerical  case  studies,  we 
recommend  the  use  of  this  heuristic  in  any  MIM  scenario  which  closely  resembles  the 
MIM  scenario  discussed  in  this  thesis.  This  heuristic  can  be  effectively  implemented  in 
an  operational  decision  aid  for  Maritime  Interdiction  Missions. 
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I.  INTRODUCTION 


A,  OVERVIEW 

Maritime  Interdiction  Missions  (MIM)  are  of  great  interest  and  high  operational 
importance  to  the  U.S.  Navy,  the  U.S.  Coast  Guard,  and  allied  forces.  MIM  typically 
involve  search,  identification,  and  interception  of  suspected  vessels  and  are  the 
operational  background  to  this  thesis.  We  develop  a  stochastic  dynamic-programming 
model  representing  a  maritime  area  of  interest  with  multiple  moving  neutral  and  hostile 
vessels  and  an  interdiction  force  comprising  two  assets:  an  unmanned  aerial  vehicle 
(UAV)  that  detects  identifies,  and  tracks  suspected  vessels  and  a  navy  vessel  that 
intercepts  suspected  vessels.  The  model  developed  in  this  thesis  generates  optimal  and 
near-optimal  policies  for  MIM  that  may  lead  to  better  utilization  of  the  interdiction  force. 
The  model  can  be  implemented  in  a  tactical  decision  aid  and  produce  real-time 
recommendations  for  courses  of  actions  to  an  interdiction  force  commander  in  MIM 
scenarios. 

B,  BENEFITS  OF  STUDY 

The  near-optimal  policy  for  employing  an  interdiction  force  in  MIM  developed  in 
this  thesis  will  have  operational  value  when  implemented  in  a  real-time  decision  aid. 
Moreover,  the  thesis  is  a  first  attempt  to  model  combined  search-interception  operations 
in  a  stochastic  and  dynamic  setting. 

C,  RELATED  WORK 

The  field  of  classical  search  theory  has  been  extensively  studied,  as  discussed  by 
Washburn  [1]  and  Stone  [2].  Additional  work  on  route  optimization  for  multiple  sensors 
and  resource-constrained  searchers  has  been  done  by  Royset  and  Sato  [3],  [4];  see  also 
references  therein. 

Dynamic  task  allocation  and  vehicle  routing  has  been  studied  by  Smith  [5],  who 

discusses  multiple  autonomous  vehicles  employment  in  dynamic  environments. 
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Optimization  of  employment  of  non-reaetive  sensors  has  been  addressed  by  Kress, 
Szeehtman  and  Jones  [6].  Probabilistie  seareh  optimization  and  mission  assignment  for 
heterogeneous  agents  is  diseusses  by  Chung,  Kress  and  Royset  [7].  This  last  study  deals 
with  a  similar  situation  as  in  this  thesis,  but  adopts  a  heuristie  approaeh  without  solution 
quality  estimates  in  terms  of  upper  and  lower  bounds.  This  thesis  develops  sueh  bounds 
for  the  speeifie  situation  with  one  UAV  and  one  navy  vessel. 

D,  THESIS  ORGANIZATION 

This  thesis  is  organized  in  the  following  way:  Chapter  I  presents  an  overview  of 
the  operational  seenario  and  the  problem  at  hand,  while  listing  the  main  assumption  used 
in  this  researeh.  Chapter  II  presents  the  basie  development  of  the  dynamie  programming 
models  used  in  this  thesis.  In  Chapter  III  we  eontinue  to  diseuss  the  models  presented  in 
Chapter  II  while  eompleting  the  detailed  development  of  all  neeessary  models.  Chapter 
IV  diseusses  an  analysis  of  a  numerieal  ease  study,  and  presents  the  results  and  insights 
of  this  analysis.  Chapter  V  summarizes  this  researeh  and  presents  the  eonelusions. 
Appendix  A  offers  a  list  of  aeronyms  and  symbols  used  throughout  this  thesis,  while 
Appendix  B  presents  a  brief  overview  of  the  MATLAB  eode  implemented  in  this 
researeh. 

E.  SCENARIO 

We  eonsider  a  maritime  area  of  interest  {AOI)  that  eontains  multiple  objeets  of 
interest  {objects)  some  of  whieh  are  hostile  objeets  {targets)  and  the  remaining  are  neutral 
objeets  {neutrals).  The  number  of  targets  and  the  number  of  neutrals  are  unknown.  The 
AOI  is  subdivided  into  a  number  of  area  eells  {ACs).  The  time  is  divided  into  diserete 
time  periods.  All  targets  are  dynamie  and  move  independently  aeeording  to  a  known 
Markov  ehain  defined  on  the  set  of  ACs.  The  neutrals  move  likewise,  but  aeeording  to  a 
different  Markov  ehain.  Motivated  by  our  diseretization  of  spaee  and  time,  with 
resolution  that  ean  be  arbitrarily  high,  and  assuming  that  the  AOI  is  relatively  large 
eompared  to  the  (unknown)  number  of  objeets,  we  negleet  the  possibility  of  more  than 
one  objeet  in  any  speeifie  AC  at  any  given  time  period.  This  is  an  approximation  to  the 
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real  situation,  which  is  quite  reasonable  in  open-sea  scenarios,  and  which  makes  the 
model  tractable.  The  interdiction  force’s  UAV  is  hereafter  referred  to  as  a  Recognizer  and 
the  navy  vessel  as  an  Interceptor.  Figure  1  presents  an  example  of  such  an  AOI,  with 
multiple  targets  and  neutrals  arriving,  leaving  and  moving  about  the  AOI  as  well  as  a 
Recognizer  and  an  Interceptor. 

The  Recognizer  patrols  the  AOI  in  a  fashion  to  be  determined,  with  the  goal  to 
correctly  identify  as  many  targets  as  possible.  After  arriving  to  a  specific  AC,  the 
Recognizer  searches  for  the  presence  of  an  object  in  that  AC.  We  assume  that  the 
Recognizer  has  perfect  detection  capabilities,  which  means  that  the  Recognizer  can 
determine  with  certainty  whether  the  AC  is  empty  or  contains  an  object.  If  there  is  no 
object  in  that  specific  AC,  the  Recognizer  decides  on  its  next  AC  to  visit.  In  case  there  is 
an  object  in  that  AC,  the  Recognizer  follows  it  for  a  single  time  period  while  trying  to 
identify  it  as  either  a  target  or  a  neutral.  After  this  single  time  period  of  tracking  the 
object,  the  Recognizer  has  to  decide  whether  that  object  is  a  likely  target  or  a  likely 
neutral.  The  Recognizer  flags  a  tracked  object  as  a  likely  target  if  the  probability  that  the 
tracked  object  is  a  target  is  greater  or  equal  to  a  predetermined  threshold.  If  a  tracked 
object  is  flagged  as  a  likely  target,  the  Recognizer  then  calls  in  the  Interceptor  to  intercept 
the  object.  While  waiting  for  the  Interceptor  to  arrive  and  intercept  the  object,  the 
Recognizer  stays  with  the  object.  In  this  thesis  we  make  the  simplifying  assumption  that 
once  flagged  as  a  likely  target,  the  object  is  forced  to  remain  stationary  at  its  location  at 
the  time  when  the  Recognizer  identified  it  as  a  likely  target.  This  assumption  avoids  the 
need  of  a  complicated  interception  model,  which  would  have  made  our  scenario  more 
realistic,  yet  would  needlessly  complicate  the  model  if  one  is  not  concerned  about  the 
interception  “end-game.” 
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Figure  1 .  An  example  of  an  AOI  with  multiple  neutrals  (A)  and  targets  (7),  a 

Recognizer  (R)  and  an  Interceptor  (I). 


The  Interceptor’s  only  task  is  to  intercept  objects  identified  to  be  likely  targets. 
The  interceptor  intercepts  the  suspicious  target  (still  being  tracked  by  the  Recognizer)  and 
identifies  it  as  a  (real)  target  or  a  neutral.  The  Interceptor  has  perfect  identification 
capability;  it  can  distinguish  with  certainty  between  a  target  and  a  neutral.  While  waiting 
for  such  tasks,  the  Interceptor  is  stationary  at  a  certain  location  inside  the  AOI  (the 
Interceptor  only  moves  when  called  in  to  intercept  a  likely  target).  The  goal  of  the 
interdiction  force  is  to  intercept  as  many  targets  as  possible  within  a  given  time  horizon, 
while  possibly  taking  into  account  discounting  of  intercepted  targets  (the  discounted 
value  of  an  intercepted  target  is  called  a  reward).  If  the  Recognizer  identifies  an  object  as 
a  likely  neutral  or  if  the  Interceptor  intercepts  an  object  that  turns  out  to  be  a  neutral,  then 
that  object  is  “marked”  as  such  and  is  of  no  future  interest  (e.g.,  some  form  of  a  tag  or  a 
beacon  is  placed  on  that  object  to  avoid  future  “redetection”  and  tracking  by  the 
Recognizer).  Figure  2  summarizes  the  concept  of  operations  (CONOPS)  used  by  the 
interdiction  force. 
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Figure  2.  Summary  of  CONOPS  of  the  interdiction  force 


F.  ASSUMPTIONS 

The  scenario  analyzed  in  this  thesis  and  the  corresponding  model  includes  several 
key  assumptions,  which  are  discussed  next. 

1.  Arrival  and  Movement  of  Targets  and  Neutrals 

We  assume  that  all  objects,  targets  and  neutrals,  independently  enter  the  AOI, 
move  inside  the  AOI,  and  then  leave  the  AOI.  The  objects  do  not  behave  strategically 
and,  therefore,  their  movement  is  not  affected  by  the  actions  of  the  interdiction  force,  that 
is,  neither  the  presence  of  the  Recognizer  nor  the  actions  of  the  Interceptor  affects  the 
behavior  of  the  objects.  We  also  assume  that  the  AOI  is  sparse  enough,  or  the  spatial  and 
time  resolutions  are  high  enough,  such  that  the  probability  of  more  than  one  object  in  an 
AC  at  any  given  time  period  is  negligible. 

At  any  given  time  period,  new  objects  may  enter  the  AOI.  The  set  of  ACs  to 
which  objects  may  arrive  is  called  the  Boundary  of  the  AOI.  At  any  given  time  period,  at 
most  one  object  may  enter  each  of  the  ACs  on  the  boundary  of  the  AOI  with  given 
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probabilities.  The  entry  probability  may  depend  on  the  type  of  objeet — target  or  neutral — 
as  well  as  the  loeation  of  the  AC.  ACs  not  on  the  boundary  have  probability  0  of  an 
arrival  from  outside  the  AOI.  Onee  inside  the  AOI,  eaeh  objeet  moves  aeeording  to  a 
known  Markov  proeess  with  different  transition  probabilities  for  targets  and  neutrals.  A 
designated  AC  represents  the  area  outside  the  AOI  to  whieh  objeets  eventually  transit 
aeeording  to  the  Markov  proeess  transition  matrix.  The  arrival  proeess  into  eaeh  of  the 
ACs  on  the  boundary  of  the  AOI  is  Bernoulli.  Note  that  a  key  assumption  is  that  the 
likelihood  of  more  than  two  simultaneous  arrivals  to  the  same  AC  is  negligible. 

Absent  interruptions  by  the  interdietion  foree,  the  overall  stoehastie  proeess 
deseribing  the  arrivals,  transits  and  exits  of  objeets  is  a  large-seale  Markov  ehain,  with 
states  defined  by  a  veetor  where  Ij  is  1  if  a  neutral  is  present  in  the 

eorresponding  AC,  Y.  is  2  if  a  target  is  present  in  the  eorresponding  AC,  Ij  is  0  if  the  AC 
is  empty,  and  \A0I\  is  the  number  of  ACs  in  the  AOI.  The  state  spaee  of  this  Markov 

ehain  is  very  large  states),  and  so  it  is  not  used  direetly  in  any  ealeulations  in  this 
thesis.  Instead  of  looking  at  the  probability  of  a  state,  we  eonsider  the  marginal 
probabilities  for  eaeh  ACP{Y.=y^\.  Our  assumptions  imply  that  are 

independent  diserete  random  variables.  We  assume  that  the  MIM  starts  when  the  system 
represented  by  is  at  steady-state.  In  subsequent  ehapters  in  this  thesis, 

whenever  we  refer  to  a  Markov  proeess  or  a  Markov  transition  matrix,  we  refer  to  the 
movement  proeess  of  some  individual  objeet  and  not  the  overall  large-seale  Markov 
proeess  diseussed  above.  We  assume  stationarity  in  the  sense  that  neither  the  entry 
probabilities  nor  the  in- AOI  transition  or  exit  probabilities  depend  on  the  time  period  t , 
and  they  do  not  depend  on  the  interdietion  foree  aetions. 

2.  Sensors  Capabilities 

The  Reeognizer  has  perfeet  deteetion  eapability,  whieh  means  that,  onee  in  an 

AC,  the  Reeognizer  deteets  the  presenee  of  an  objeet  in  that  AC  with  eertainty  and  there 

are  no  false  positive  deteetions.  Following  a  deteetion,  the  Reeognizer  traeks  the  deteeted 

objeet  for  a  single  time  period  in  an  attempt  to  reeognize  it  and  determine  whether  it  is  a 
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target  or  a  neutral.  The  traeking  proeess  is  fault-free  in  the  sense  that  the  Reeognizer 
never  loses  eontaet  with  the  traeked  objeet,  however,  the  reeognition  eapability  is 
imperfeet  and  the  identity  of  the  objeet  may  be  subjeet  to  false  negative  (identifying  a 
target  as  a  neutral)  and  false  positive  (identifying  a  neutral  as  a  target)  errors.  The 
Reeognizer  utilizes  two  reeognition  teehniques:  movement  recognition  and  signature 
recognition.  Movement  reeognition  is  based  on  the  movement  pattern  of  the  objeet, 
attempting  to  deteet  anomalies.  Signature  reeognition  is  based  on  the  physical 
characteristics  of  the  object. 

The  Interceptor  is  assumed  to  have  perfect  recognition  capability.  Once  the 
Interceptor  has  intercepted  a  suspicious  object  (after  being  dispatched  by  the  Recognizer, 
which  had  flagged  that  object  as  a  likely  target)  it  can  perfectly  determine  whether  this 
object  is  indeed  a  target  or  a  neutral. 

3.  Separation  of  Detected  Objects 

Once  an  object  is  flagged  as  a  likely  neutral  (at  the  end  of  the  Recognizer’s 
tracking  phase)  or  once  an  intercepted  object  turns  out  to  be  a  neutral,  it  is  taken  away 
from  the  AOI  for  the  remainder  of  the  time  horizon.  We  assume  that  in  both  cases  some 
form  of  a  beacon  or  other  tagging  technique  is  used  to  prevent  this  specific  object  from 
ever  being  of  interest  for  the  interdiction  force. 

4,  Finite  Time  Horizon 

The  time  horizon  is  finite,  which  means  that  the  interdiction  force  does  not  care 
about  events  that  will  take  place  past  that  time  horizon.  In  particular,  a  successful 
interception  of  a  target  past  the  time  horizon  is  considered  to  be  worthless  for  the 
interdiction  force.  This  assumption  may  be  appropriate  in  an  operational  scenario  in 
which  there  exists  a  strict  deadline.  In  scenarios  in  which  a  strict  deadline  does  not  exist, 
this  artificial  time  horizon  is  known  to  have  some  distorting  effect  on  the  results  of  the 
model.  One  can  try  to  reduce  this  end-effect  by  using  a  long  time  horizon. 
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II.  MODELS  DEVELOPMENT:  MAIN  PARTS 


A,  DYNAMIC  PROGRAMMING  FORMULATION 

Let  A  denote  the  set  of  ACs  in  the  AOI,  A,,  denote  the  area  outside  the  AOI  to 
whieh  objeets  eventually  transit  when  they  leave  A  ,  and -S'  cz  A  denote  the  subset  of  ACs 
at  the  boundary  of  the  AOI.  Let  t  denote  the  index  of  time  periods.  Let  T  denote  the 
finite  time  horizon  of  the  seenario.  As  mentioned  in  the  Introduetion,  we  assume  that  the 
grid  of  ACs  and  time  resolutions  are  fine  enough  sueh  that  at  any  given  time  period  there 
may  be  at  most  one  arrival  of  a  new  objeet  into  eaeh  one  of  the  ACs  in  A  and  at  most  one 
objeet  present  in  a  eertain  AC  in  A  .  The  objeets  enter  the  AOI  and  move  about  the  AOI 
independently. 

The  dynamie  programming  formulation  used  in  this  thesis  is  based  on  the 
eonventions  found  in  Powell  [8],  pp.  129-178. 

1.  State 

We  define  the  state  as  the  veetor  s  =  {t,r,i,7i:,0)  where  t  is  the  time  period, 

r  e  Ais  the  Reeognizer’s  loeation,  /  e  Ais  the  Intereeptor’s  loeation,  ;7'  is  a  vector  of 
probabilities  with  components  a&A,  where  is  the  probability  that  a  neutral  is 

present  in  AC  a  ,  and  0  is  &  vector  of  probabilities  with  components  0^,  a  &  A,  where 
6^  is  the  probability  that  a  target  is  present  in  AC  a  .  Let 

|l,2,...r}x  Ax  Ax[0,l]''^' x[0,l]'’^'  be  the  space  of  all  possible  state  vectors. 

Although  the  two  probability  vectors  n  and  0  are  continuous,  they  only  can  take  finite 
number  of  values  because  the  number  of  detection  and  interception  opportunities  in  a 
given  time  horizon  is  finite.  Thus,  our  defined  state-space  5  is  of  finite  (though  high) 
cardinality. 
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2. 


Decision 


A  decision  x  in  the  scenario  determines  the  next  AC  to  be  visited  by  the 
Recognizer,  thus  x  e  A.  This  decision  is  made  either  at  t  =  0  or  when  the  existing 
decision  is  fathomed.  A  decision  x  e  ^  is  said  to  be  fathomed  in  one  of  the  following 
three  situations:  (i)  no  object  is  found  by  the  Recognizer  in  ACx ,  (ii)  an  object  is  found 
by  the  Recognizer  but  it  is  recognized  as  a  neutral,  or  (iii)  an  object  is  intercepted.  Note 
that  as  a  decision  can  only  be  made  in  one  of  the  cases  presented  above  (start  of  the 
search  or  when  a  decision  is  fathomed),  this  dynamic  programming  model  slightly  differs 
from  standard  models  where  decisions  are  made  at  a  constant  rate  at  every  time  step  (see 
Powell  [8],  pp.  132-135).  In  fact,  the  time  interval  between  two  consecutive  decisions  is 
a  random  variable,  which  is  defined  in  the  next  section.  Since  a  decision  is  always  made 
based  on  the  current  state,  we  often  write  x(5).  In  some  cases,  we  drop  the  explicit 
dependency  of  the  decision  x  on  the  state  s  to  simplify  our  expressions. 

3.  Information 

Let  w  =  denote  the  random  vector  representing  the  information 

obtained  by  the  interdiction  force  once  a  decision  is  fathomed  and  the  consequences  of 
that  decision  are  realized.  The  time-interval  At^  denotes  the  time  duration  of  the  current 
state-transition,  that  is,  the  duration  between  the  time  period  in  which  decision  x  is  taken 
and  the  time  period  when x  is  fathomed.  The  variables  and  /^denote  the  Recognizer’s 
and  Interceptor’s  locations,  respectively,  at  the  end  of  the  current  state-transition.  The 
Bernoulli  random  variable  is  equal  1  if  a  target  has  been  intercepted  and  0  otherwise. 

Note  that  these  four  random  variables  are  not  independent.  Let  W  denote  the  space  of 
possible  realizations  of  w .  The  explicit  probability  distribution  of  w ,  which  is  a  function 
of  both  the  current  state  s  and  the  decision  x(5) ,  is  presented  later  on.  To  represent  this 

dependency,  we  write  w(5,x(5))  to  denote  the  random  vector  w  with  probability 

distribution  depending  on  the  state  5  and  the  decision  x(5).  Our  notation  does  not 
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distinguish  between  the  random  veetor  w  or  its  eomponents  and  their 

realizations.  The  meaning  should  be  elear  from  the  eontext. 

4,  State  Transition 

The  next  state  is  determined  using  the  state -transition  funetion 
5“  (5,x(5), ,  whieh  is  a  deterministie  funetion  of  the  eurrent  state,  of  the 

deeision  made  and  of  the  information  obtained  following  the  realization  of  a  deeision; 

(2.1) 

The  explieit  funetion  will  be  presented  later. 

5,  Reward 


The  reward c  is  a  funetion  of  the  information  w and  the  states  and  is  defined  in  the 
following  way: 


c:  Wx*S 


^  T 


(2.2) 

(2.3) 


where  w  =  ,s  =  and  T  is  the  time  horizon  of  the  seenario. 

The  reward  is  0  if  no  target  is  intereepted  or  if  the  time  of  intereeption  is  beyond  the  time 
horizon,  and  the  reward  is  1  if  a  target  is  intereepted  earlier  than  the  end  of  the  time 
horizon,  while  introdueing  a  diseounting  faetory  for  intereeption  at  later  times.  Choosing 
a  diseount  faetory  =  0  means  that  the  same  reward  is  eolleeted  for  intereepting  a  target 
now  or  later.  A  diseount  faetory  >  0  means  that  the  reward  for  intereepting  a  target  in  the 
near  future  is  higher  than  intereepting  it  at  a  later  point  in  time.  Note  that  the  reward  is 
not  a  funetion  of  the  state  alone,  but  of  the  information  obtained  (after  making  a  deeision) 
as  well. 
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6. 


Timeline 


We  assume  the  following  timeline  (Figure  3);  Once  in  a  states ,  and  a  decision  has 
just  been  fathomed,  a  new  decision  x(5)  e  ^  is  made,  after  which  information  w( 5, x)  is 

obtained,  and  the  new  state  s'  is  derived  by  the  deterministic  function  s^  [s,x,w) .  As  a 
decision  is  not  made  at  every  time  step,  and  information  is  not  obtained  at  every  time 
step,  we  do  not  use  the  common  notation  of  subscript  t  for  s^,x,  oxw^ .  In  our  notation  tis 
treated  as  any  other  component  of  the  state  vectors  . 


s' 


Figure  3.  Timeline  of  the  dynamic  programming  formulation 


7.  Bellman  Equation 


We  define  the  value  of  being  in  state  s  as  the  maximum  expected  reward  from 
the  time  of  being  in  state  s  to  the  end  of  the  time  horizon  T .  This  value  V  (5)  is  given 
by  a  Bellman  equation: 


F(.)  = 


maxiE’ 

x(.) 

0, 


t<T 

t>T 


(2.4) 


where  t  is  the  time  of  state  s  =  [t,  r,  i,  n,  6)  .  This  equation  can  be  written  in  a  simplified 
form: 


max ,  _ , 

x{s)  (  L 


[e  c{w,s^  +  V {^s^  {^s,x,w'j^ 


,  t  <T 
t>T 


(2.5) 


The  optimal  policy  in  a  state  5  is  the  optimal  solution  of  the  above  Bellman  equation. 
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B, 


METHODOLOGY  OUTLINE 


Let  5  denote  the  problem  of  finding  the  optimal  policy,  which  maximizes  the 
reward  obtained  in  a  finite  time  horizon  T  according  to  the  states,  decisions,  information, 
state  transition  and  rewards  presented  above.  This  problem  can  theoretically  be  solved  by 
backward  calculation  of  its  states’  values  using  the  Bellman  equation  (Equation  2.5).  A 
Backward  Dynamic  Programming  algorithm  (see  Powell  [8],  p.  50)  starts  from  the  end  of 
the  time  horizon  and  backtrack  to  the  present  while  calculating  the  value  of  each  possible 
state  at  each  time  period.  We  recall  (2.5)  that  the  value  of  each  state  is  the  expected 
reward  obtained  in  the  next  state  transition  plus  the  expected  value  of  the  next  resulted 
state,  which  already  has  been  calculated  since  we  are  moving  backwards  through  time. 
This  sequential  process  requires  the  calculation  of  each  state  value  for  each  time  period  in 
the  finite  time  horizon,  and  so  it  is  crucial  to  have  a  small  enough  state  space  to  be  able  to 
practically  implement  this  algorithm  and  solve  a  real-life  problem.  We  encounter  a 
difficulty  when  we  realize  that  the  state  space  5  has  a  huge  number  of  possible  states, 
mainly  due  to  the  probabilities  vectors  n  and  0 ,  which  can  take  numerous  values.  This 
calculation  difficulty  renders  our  problem  intractable,  and  so  we  turn  to  a  different 
method  of  producing  an  operational  policy  to  be  applied  in  the  scenario. 

In  this  thesis,  we  suggest  to  use  an  easy-to-calculate  heuristic  instead  of  solving 
for  the  exact  optimal  policy,  while  analyzing  the  performance  of  this  heuristic  by 
bounding  the  error  between  this  heuristic’s  value  and  the  actual  optimal  policy’s  value. 
We  construct  a  new  problem  5  ,  named  the  Upper  Bound  Problem,  characterized  by  the 
fact  that  the  reward  collected  in  this  new  problem  is  an  upper  bound  to  the  reward 
collected  in  our  original  problem  5  .  Any  feasible  policy  in  problem  5  is  a  lower  bound  on 
the  optimal  reward  collected  infusing  the  optimal  policy,  and  so  by  calculating  the 
reward  collected  using  our  heuristic’s  policy  in  problem 5  and  solving 5  for  its  optimal 
reward  we  are  able  to  bound  the  optimal  reward  of  our  original  problem  5  .  Furthermore, 
the  difference  between  the  optimal  reward  collected  'mB  and  the  reward  collected  using 
our  heuristic’s  policy  is  an  upper  bound  for  the  difference  between  the  optimal  reward 
collected  in  the  original  problem^  and  the  reward  collected  using  our  heuristic’s  policy. 
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This  thesis  includes  the  complete  developing  of5and5,  a  method  of  effectively 
solving 5  ,  a  method  of  effectively  calculating  the  reward  collected  using  the  heuristic’s 
policy,  and  a  numerical  analysis  of  several  case-study  examples. 

C.  SUGGESTED  HEURISTIC 


We  construct  a  simple  and  greedy  heuristic  for  the  Original  Problem  B ,  defined 
as  follows: 


(5)  G  argmax- 


jR 

r.a  i.a 


(2.6) 


where  5  =  is  the  current  state,  r  is  the  Recognizer’s  current  location,  i  is  the 

Interceptor’s  current  location,?)*  is  the  time  it  takes  the  Recognizer  to  reach  AC  a  from 
its  current  location  r,  T‘^  is  the  time  it  takes  the  Interceptor  to  reach  AC  a  from  its 

current  location  i ,  and  0^  is  the  estimated  probability  of  a  target  in  AC  a  at  the  time  the 

Recognizer  reaches  AC  a  if  the  Recognizer  decides  to  visit  that  AC.  We  add  one  time 
period  for  the  Recognizer’s  identification  process  (the  time  it  takes  to  identify  a  detected 
object  as  either  a  likely  target  or  a  likely  neutral  while  tracking  it).  This  is  an  attempt  to 
estimate  a  normalized  value  of  each  potential  AC  to  visit  next,  by  dividing  the  probability 
of  a  target  being  in  that  AC  by  a  rough  estimation  of  the  time  it  will  take  to  detect  and 
intercept  it.  There  is  a  chance  for  a  tie  between  several  ACs  that  maximize  the  above 
expression,  in  which  case  we  can  arbitrarily  choose  among  them. 


D.  REWARD  CALCULATION  USING  HEURISTICS  POLICY 


This  reward  is  calculated  by  running  forward  in  time  in  the  original 
problem  R  while  following  our  heuristic  decision  policy.  The  Bellman  equation  used  in 
this  calculation  is: 


t<T 

t>T 


(2.7) 
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which  is  the  Original  Problem’s  Bellman  equation  without  the  maximization  over  the 
possible  deeisionsx,  but  instead  just  ehoosing  xwith  our  heuristie.  We  use  K^(5)to 

denote  the  value  of  eaeh  state  following  our  heuristie  deeision  poliey,  whieh  is  different 
than  the  optimal  value  of  each  state  using  the  original  Bellman  equation  (2.5). 

E.  UPPER  BOUND  PROBLEM 

We  define  an  Upper  Bound  Problem  B  that  is  a  simplified  version  of  our  Original 
Problem  5  .  The  main  differenee  between  the  two  problems  is  that  in  B  the  interdietion 
foree  experienees  a  “situational  awareness  memory  loss”  after  eaeh  decision  is  fathomed. 
Speeifieally,  eaeh  time  a  decision  has  been  fathomed,  we  “reset”  the  two  probability 
vectors  tv  and  9  to  their  initial  values  at  time  t  =  0 .  Because  the  probability  vectors  /r 
and  9  both  remain  the  same  each  time  a  decision  is  made,  we  can  exclude  them  from  the 
definition  of  the  state  in  the  Upper  Bound  Problem  5  .  This  simplified  state  space  makes 
the  Upper  Bound  Problem  tractable  and  allows  us  to  solve  it  in  reasonable  time.  The 
above  “memory  loss”  property  also  means  that  we  allow  the  interdietion  foree  to 
potentially  re-collect  the  same  rewards  over  and  over  again.  In  the  Original  Problem, 
each  time  the  Recognizer  visits  an  AC,  we  set  the  probability  that  the  AC  contains  a 
target  and  the  probability  that  AC  contains  a  neutral  to  0  (a  comprehensive  discussion  of 
these  probability  updates  appears  in  Chapter  III.A.2).  Our  definition  of  the  state  in  the 
Upper  Bound  Problem  does  not  include  any  history  of  all  previously  visited  ACs. 
Specifically,  the  Recognizer  do  not  “remember”  that  the  probabilities  of  a  target  and  a 
neutral  being  in  eaeh  visited  AC  should  be  set  to  0  at  the  time  of  that  visit.  Explieitly,  by 
not  setting  to  0  the  probability  that  a  visited  AC  eontains  another  target  onee  it  has  been 
visited,  all  ACs  in  the  AOI  have  greater  or  equal  probability  of  containing  a  target  than 
they  should  truly  have  (as  in  the  Original  Problem).  Instead  of  keeping  track  of  the  most 
updated  n  and  9  vectors  as  being  done  in  the  Original  Problem  5  ,  eaeh  time  we  make  a 
decision  we  assume  we  are  back  to  the  initial  steady-state  of  the  probability  vectors  n 
and  9 .  These  steady-state  probability  vectors  are  discussed  later,  with  explicit  formulas 
appearing  in  (3.1)  and  (3.2).  Having  this  “memory  loss”  property  and  assuming  we 
always  make  a  decision  in  steady-state,  we  are  in  risk  of  getting  “trapped”  in  the  same 
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AC  (which  probably  have  relatively  high  probability  of  eontaining  a  target)  just  beeause 
we  do  not  take  into  aeeount  the  faet  that  we  have  just  visited  this  AC  and  as  such  the 
probability  that  another  target  has  just  entered  into  this  AC  is  likely  to  be  less  than  the 
high  steady-state  probability  of  a  target  being  in  this  AC.  In  an  attempt  to  avoid  these 
unwanted  eases  we  temporarily  update  the  probability  of  another  objeet  in  the  AC  the 
Recognizer  is  currently  in  at  the  time  of  making  eaeh  decision  (i.e.,  drop  both  the 
probability  of  a  target  in  this  AC  and  the  probability  of  a  neutral  in  this  AC  down  to  0). 
This  temporary  update  exists  only  until  the  current  decision  is  fathomed.  Once  we 
eomplete  the  eurrent  state  transition  and  end  up  in  the  next  state  we  “forget”  this 
temporary  update  and  assume  we  are  baek  to  a  stead-y-state. 


As  discussed  above,  the  Upper-Bound  Problem’s  state  is  slightly  different  from 
the  Original  Problem’s  state  and  is  defined  as  follow; 

T  =  (T,r,/)'  (2.8) 

where  [t  ,r,i )  are  the  time,  Reeognizer’s  loeation  and  Intereeptor’s  loeation  (the  same 


as  in  the  Original  Problem  5),  and  the  probability  vectors  tt  and  6  have  been  omitted. 
The  Upper-Bound  Problem’s  Bellman  equation  ean  now  be  written; 


maxl£' 

x(?)  1 

c{w,s)  +  V[s‘^  {s  ,w))  1 

0’ 

t  <T 
T>T 


(2.9) 


whieh  seems  almost  identieal  to  the  Original  Problem  Bellman  equation  (2.5),  but  of 
eourse  differ  in  the  actual  definitions  and  meaning  of  the  variables  and  funetions  it 
incorporates  (e.g.,  the  definitions  of  the  state  T,  the  probability  distribution  of  the 
informations,  ete).  We  use  “bar”  to  denote  these  variables  and  funetions  in  the  Upper- 
Bound  Problem  B,  whieh  differ  from  their  counterparts  in  the  Original  Problem  5. 
Another  subtle  difference  is  that  the  transition  function  is  not  directly  related  to  x 
but  only  through  iv ,  as  presented  later  in  (3.54).  A  more  comprehensive  discussion  of  the 
Upper  Bound  Problem^  is  presented  in  Chapter  III.B. 
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III.  MODELS  DEVELOPMENT:  DETAILS 


A,  ORIGINAL  PROBLEM 

1,  Probabilities  and  Object  Recognition  Techniques 

In  this  section,  we  present  the  notation  and  formulas  describing  the  calculation 
and  update  of  several  variables  and  probability  distributions  needed  for  our  model.  The 
steady-state  probabilities  of  a  neutral  and  a  target  at  each  AC  comprise  the  baseline  at  the 
beginning  of  the  scenario  (though  we  could  start  with  non  steady-state  probability  vectors 
just  as  well).  As  the  scenario  progresses,  these  steady-state  probabilities  will  no  longer  be 
relevant  and  we  will  have  to  keep  track  of  an  updated  probability  map  of  neutrals  and 
targets  at  each  AC  in  the  AOI.  These  probabilities  will  change  as  a  result  of  the  sensors’ 
observations  and  the  Markov  transition  probabilities  of  the  objects. 

a.  Initial  Pre-Search  Probabilities:  Uninterrupted  Steady-State 

LetP  =  a',aeAuAo  be  the  Markov  transition  matrix  for 

any  neutral  in  the  AOI,  with  P[^a',a)  representing  the  probability  of  a  single  time-step 
transition  from  AC  a  to  AC  a.  Let  0  =  («',«)],  a'  ,a  ^  be  the  Markov 

transition  matrix  for  any  target  in  the  AOI,  with  g  (a',  a)  representing  the  probability  of  a 
single  time-step  transition  from  AC  a  to  AC  a .  Let  a,  denote  the  single  time  step 
probability  of  a  neutral  entry  to  AC  I  eE ,  and  let  denote  the  single  time  step 
probability  of  a  target  entry  to  AC  I  eE . 

Absent  any  other  information  (such  as  sensor’s  observations),  and  based 
on  our  assumption  that  the  probability  of  more  than  one  object  in  a  cell  is  negligible,  the 
steady-state  probability  of  a  neutral  in  AC  a  e  A  is  approximated  by 
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l-(l-aJ]^f][(l-«,P^(/,a)),  a^E 

nl=\  (3.1) 

l-]^f][(l-a,/’^(/,a)),  a^E 

IgE  k=l 

where  P'‘  {l,a)  is  the  (/,  a)  entry  of  P'" — the  transition  matrix  P  raised  to  the  A:'*  power, 

and  the  superscript  0  denotes  steady-state  probabilities.  Similarly,  the  steady-state 
probability  of  a  target  in  AC  a  e  A  is  approximated  by 


i-(i-A)nn(>-Ae‘('.o)). 

IgE  k=l 

00 

IgE  k=\ 


Based  on  our  assumptions,  we  identify  three  mutually  exclusive  and 
collectively  exhaustive  events  associated  with  an  AC  a  e  A ;  (i)  void  of  objects,  denoted 
,  (ii)  contains  one  neutral,  denoted  ,  and  (hi)  contains  one  target,  denoted  E  •  The 
event denote  a  situation  where  there  is  an  object,  of  any  kind,  in  AC  a,  that  is, 
'  ^ur  Spatial  and  temporal  assumptions  lead  us  to  the  following 
approximated  results  regarding  the  steady-state; 


(3.3) 

Pr{7;}«0:  (3.4) 

pr  { r,  H  ( 1  -  )  (i  -  ) = 1  -  (3.5) 

b.  Markov  Updates 

When  about  to  make  a  decision  regarding  the  next  AC  to  investigate,  the 
Recognizer  first  needs  to  estimate  the  probability  of  a  target  and  the  probability  of  a 
neutral  at  each  of  the  ACs  in  the  AOl  at  the  time  it  would  arrive  to  the  designated  AC. 
Let  t'  denote  the  time  a  decision  is  made  and;r'  the  probability  vector  of  a  neutral  in  each 
AC  at  time  t' .  For  any  time  t  >  t'  we  can  calculate  the  new  probability  vector  due  to  the 
progress  of  time,  absent  any  new  information  about  the  AOl  (i.e.,  no  sensor 
observations).  Let  n‘  denote  this  new  probability  vector; 
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(3.6) 


n‘  =  ^ 

where  ^  is  the  function  which  updates  the  probability  vector  of  a  neutral  at 

each  AC.  Note  that  this  function  does  not  depend  on  both  time  arguments  but  only  on 
their  difference.  This  function  appears  in  several  other  places  in  this  thesis.  The  trivial 
case  is  when  no  time  had  passed  (i.e.,  t  =  t'),  in  which  case  there  is  no  update  and  so  we 
get  that  7r‘  =  -f  ,7i:‘  '^  =  7i:‘  .  Also  note  that  initially,  absent  any  sensor  information, 

the  system  is  in  steady-state  and  therefore  7:*  =  7i{t  ^  =  7i:‘  for  all  t  >  C .  We  need 

to  distinguish  between  four  non- trivial  cases:  (i)  single  time-step  andae^,  (ii)  two  or 
more  time-steps  and  a  e  E ,  (iii)  single  time-step  and  a  ^  E ,  (iv)  two  or  more  time-steps 
anda  ^  E : 

-  7i:[,P{^a' ,a)^,  t  =  t'  +  l,a&E 

a', a)),  t  =  t'  +  l,a^E 

(3.7) 

{a',  a)) 

A 

t  =  t' 

where  P*  is  the  neutrals’  Markov  transition  probabilities  matrix  raised  to  the  power. 


V  ) 

f,a))  (/,a))  ,  t  >t'  +  2,a  e  E 

J  V  k=\  ) 

t-t'-y  A 

(/,a))  ,  t>t'  +  2,a€E 

it^E  4=1  y 


Similarly  we  define  6[  =  0^{t- 1' ,9‘  ^  for  the  targets: 

a'&A 


i-(i-A)  t>t’+2,a^E 


1  -  n (i - («’«))  n n (i - (^. «)) 


t  -  t'  +  \,a  &  E 


t  -t'  +  \,a  €  E 


t>t’  +  2,a€E 
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c. 


Recognizer  Updates 


In  this  section,  we  discuss  the  probability  updates  generated  by  the 
Recognizer  while  detecting  and  tracking  an  object.  These  updates  eventually  determine 
whether  or  not  the  Recognizer  calls  in  the  Interceptor  to  intercept  the  suspicious  object. 


Recall  that  the  Recognizer  has  perfect  detection  capabilities  (i.e.,  detecting 
whether  an  AC  is  empty  or  not).  Let  x  e  A  denote  the  AC  the  Recognizer  has  decided  to 
investigate  next.  If  the  Recognizer  arrives  at  time  t  in  AC  xand  finds  no  object  there, 
then  the  current  decision  (search  AC  x  at  time  t )  is  fathomed  and  a  new  decision  needs  to 
be  made.  If  an  object  is  detected  in  ACx ,  then  the  immediate  probabilities  update  are: 


7il  +  e‘ 


e\ 


t.Det 


91 


=  1-  TT 


t.Det 


(3.9) 

(3.10) 


where  ;rj’^‘''and  9‘f^‘  are  the  probabilities  of  the  detected  object  being  a  neutral  or  a 
target,  respectively,  and  n\  and  6'jare  the  corresponding  prior  probabilities. 


If  an  object  is  detected  in  ACx  at  time  t,  the  Recognizer  continues 
tracking  the  object  for  one  more  time  period  (t  +  1)  in  which  the  tracked  object  either 
moves  to  AC  j  e  A,  leaves  the  AOI  to  AC  Aq  or  stays  at  ACx .  Note  that  the  object 
might  stay  at  AC  x  or  leave  the  AOI  to  A^ ,  according  to  the  specific  Markov  transition 
matrices  R  and  Q .  If  the  object  has  left  the  AOI,  the  Recognizer  ends  up  back  in  ACx  at 
the  end  of  the  tracking  time  period  and  the  decision  is  fathomed.  After  tracking  the 
detected  object  (assuming  it  did  not  leave  the  AOI),  the  Recognizer  decides  if  it  is  likely 
to  be  a  neutral  or  a  target.  While  tracking  the  object,  the  Recognizer  only  focuses  on  the 
tracked  object  and  is  not  searching  in  other  ACs. 

During  tracking,  the  Recognizer  utilizes  two  modes  for  recognizing  the 
tracked  object:  signature  recognition  (e.g.,  electro-optical  sensor,  radar,  etc)  and 
movement  recognition,  in  which  the  Recognizer  tries  to  identify  the  movement  pattern  of 
the  tracked  object  (i.e.,  leaving  known  shipping  lanes  or  any  other  suspicious  movement). 
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Both  recognition  techniques  take  place  within  the  tracking  time  period.  Without  loss  of 
generality,  we  assume  that  signature  recognition  takes  place  first. 


During  the  signature  recognition  mode,  the  Recognizer  takes  g  looks 
(glimpses)  at  the  tracked  object  where  the  glimpses  are  conditionally  independent  given 
the  presence  of  the  object.  Let  \  -u  denote  the  single  glimpse  false  negative  probability 
of  the  Recognizer  identifying  a  target  as  a  neutral,  and  let  1  -  v  denote  the  single  glimpse 
false  positive  probability  of  the  Recognizer  identifying  a  neutral  as  a  target.  Suppose  that 
n  glimpses  report  “neutral”  and  g-n  glimpses  report  “target.”  Recall  that 7  e  ^  is  the 

AC  to  which  the  tracked  object  has  moved  while  being  tracked.  Let  denote  the 


posterior  probability  of  a  neutral  following  the  g  glimpses  of  the  signature  recognition 
process.  We  have; 


\n) 

1 

s: 

T 

^x 

f  \ 

g 

1 

s; 

> 

~n 

t.Det  , 
+ 

g] 

|(l-u)"(uPe"" 

(3.11) 


and  similarly,  the  posterior  probability  of  a  target  is; 
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.i  +  \,Sig  _ 


\{i-uy{u) 

g-»  at,Det 

^x 

f  \ 
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|v“(l-v) 
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”  ^UDet  1 
^x  + 
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t+\.Sig 


J 


(3.12) 


Note  that  the  factors 


cancel  out. 


Finally,  the  movement  recognition  mode  takes  into  account  the  fact  that 
the  object  has  moved  from  ACxto  AC  7  .  Taking  the  posteriors  of  the  signature 
recognition  mode  as  priors  for  the  movement  recognition  mode  we  obtain  the  probability 
of  neutral; 


t+\,Rec 


P{x,j) 


t-\-\,Sig 


R(x,7>r’"''^  +  0(x,7)C'''' 

Similarly,  the  posterior  probability  of  a  target  is; 


(3.13) 


Rec  /^t-\-l,Rec 


e  =  0 


+  Q{xj)9 


1  /+Li?ec 


(3.14) 


where  we  introduce  0  as  a  short-hand  notation  for  0'^^  ’  ,  which  we  use  later. 
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Once  tracking  is  complete,  the  Recognizer  compares  the  probability  in 
(3.14)  to  a  predetermined  threshold  M  and  decides  whether  to  flag  the  tracked  object  as  a 
target  or  to  let  it  go  and  decide  on  the  next  AC  to  visit.  If  this  threshold  is  met,  the 
Interceptor  is  called  for  interception  while  the  Recognizer  keeps  watching  the  object  until 
the  arrival  of  the  Interceptor. 

2.  The  State-Transition  Function 


We  recall  that  the  state-transition  {^s,x,w)  is  a  deterministic  function  of  the 


current  state  s  ,  of  the  decision  to  visit  AC  x ,  and  of  the  information  revealed  once  x  is 
fathomed.  We  define  the  state-transition  function  in  the  following  way: 

(3.15) 


M  (  \ 

s  [s,x,w)  = 


r 

w 

L. 


M 


n 


(3.16) 


where  w  =  (AC,r^,zj^,z^)  is  the  information,  s  =  {t,r,i,7i:,6)  is  the  current  state,  and 

and  0^  are  components  of  representing  the  probability  vectors  tt  and  0  of  the 
next  state.  In  computing  and we  use  (3.7)  and  (3.8).  There  are  three  periods  of 
time  we  potentially  need  to  account  for  (Figure  4):  (i)  the  time  between  making  the 
decision  to  go  to  AC  x  and  the  time  of  arrival  to  x ,  (ii)  the  tracking  time  of  the  detected 
object,  and  (iii)  the  waiting  time  for  the  Interceptor  to  arrive  and  intercept  the  object. 
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Figure  4.  Timeline  of  state  transitions 

For  any  deeisionx  e  A  ,  the  time  the  Reeognizer  arrive  at  ACx  is  t  +  .  Let  n 

and  6  denote  the  resulting  probability  veetors  of  a  neutral  and  a  target,  respeetively,  in 
eaeh  AC  in  the  AOI  at  time  t  +  For  eaeh  vector  component  corresponding  to  an 
AC  a  e  A  we  get; 


;r(7;*  ,;r). 

a  ^  X 

(3.17) 

0’ 

a  =  X 

a  ^  X 

(3.18) 

0, 

a  =  X 

where  we  set  ;T^and  6^  to  0  because  of  two  possible  reasons:  either  no  object  has  been 

detected  in  ACx,  or  an  object  was  detected  but  because  it  will  be  tracked  and  handled 
separately  -  we  can  now  assume  the  probability  of  another  object  in  AC  x  is  0. 

We  now  need  to  account  for  the  time  of  tracking  a  detected  object,  if  one  had 
indeed  been  detected.  Let  Ti  denote  an  operator  that  operate  on  a  probability  vector  (either 
TV  or  6).  The  operator  updates  the  probability  vector  it  operates  on  to  the  appropriate 
value  one  time  step  later,  and  then  set  to  0  the  probability  component  of  that  vector  which 
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corresponds  to  the  eurrent  Recognizer  loeation.  Using  the  operator  on  a  probability 
veetor  produees  another  probability  veetor.  Let(7-t;T)^  denote  the  eomponent  of  the 

resulting  probability  vector  TiTi  corresponding  to  AC  a  e  A  .  We  ean  formally  define  the 
operation  of?f  in  the  following  way: 


if  a  is  not  the  AC  to  which  the  tracked  object  moved 
if  a  is  the  AC  to  which  the  tracked  object  moved 

(3.19) 

(««).  = 

p(l,0), 

if  a  is  not  the  AC  to  which  the  tracked  object  moved 

if  a  is  the  AC  to  which  the  tracked  object  moved 

(3.20) 

Recall  that  the  duration  of  a  tracking  phase  is  always  a  single  time  step.  When 
arriving  at  ACr^we  need  to  have  probabilities  update  similarly  to  the  one  we  had  for  the 

time  it  takes  the  Reeognizer  to  reaeh  AC  x .  If  the  traeked  objeet  is  eventually  flagged  as  a 
likely  target,  we  need  to  have  a  similar  probability  update  for  each  time  step  until  the 
deeision  is  fathomed.  For  both  oases  when  the  traeked  objeet  is  flagged  as  a  likely  target 
and  when  it  is  not  flagged  as  a  likely  target,  the  overall  remaining  time  between  the  time 
the  Reeognizer  reaohes  AC  x  and  when  the  deeision  is  ultimately  fathomed,  is  . 

For  eaoh  time  step  until  the  deeision  is  fathomed,  we  need  to  update  the  probabilities  of  a 
target  and  a  neutral  at  eaoh  AC  using  .  We  ean  now  explioitly  write  and6'*^ : 

=n^n  (3.21) 

=n^e  (3.22) 

where  H!" n  denotes  operating  for  k  times  on  the  probability  veetor  n,  denotes 
operating  Ti  for  k  times  on  the  probability  vector  6  ,  and  we  use  k  =  .  In  the 

case  that  k  =  At^  -  =  0  ,  the  operator  is  the  identity  operator. 
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3. 


Information  Probability  Mass  Function 


We  need  the  probability  mass  function  of  the  obtained  information  w  for  several 
reasons.  Firstly,  in  order  to  calculate  the  Bellman  equation  for  the  Original 
Problem  5  (2.5),  we  need  to  compute  the  expected  value  (with  respect  to  w )  of  the  sum  of 
the  reward  and  the  value  of  the  next  state.  As  discussed  earlier  (Chapter  II.B),  such  a 
direct  calculation  is  intractable,  and  so  the  approach  taken  in  this  thesis  is  to  approximate 
the  expected  value  using  lower  and  upper  bounds.  Still,  we  present  the  formulas  for  direct 
calculation  of  the  Original  Problem’s  Bellman  equation  in  Chapter  III.A.4.  Another 
reason  we  need  the  probability  mass  function  ofwis  for  the  simulative  calculation  of 
Bellman  equation  using  our  suggested  heuristic,  as  later  discussed  in  Chapter  III.C.  In 
this  simulative  calculation  we  need  to  generate  multiple  realizations  of  the  obtained 
information  w  according  to  its  probability  mass  function.  Furthermore,  the  following 
derivation  of  the  probability  mass  function  of  w  serves  as  the  basis  for  the  derivation  of 
the  probability  mass  function  of  the  obtained  information  w  in  the  Upper-Bound 
Problem  5 . 

Recall  that  the  information  vector  w  describes  the  tactical  consequences  of  a 
decision  to  visit  a  certain  AC  x :  the  time  until  the  decision  is  fathomed,  the  locations  of 
the  Recognizer  and  Interceptor  when  this  happens  and  whether  or  not  a  target  has  been 

eventually  intercepted.  While  w  =  (AC,r^^,z^,z^)  is  an  end-result,  it  can  be  equivalently 
described  in  terms  of  the  events  that  occur  during  .  Following  a  decision  x  there  are 
five  possible  scenarios  which  may  occur:  (i)  no  object  is  detected  in  ACx ,  (ii)  an  object 
is  detected  in  ACx  and  it  leaves  the  AOI  (moves  to  A,,)  while  being  tracked,  (iii)  an 
object  is  detected  at  ACx ,  moves  to  AC  J  €  A  and  is  flagged  as  a  neutral,  (iv)  an  object 
is  detected  at  AC  x ,  moves  to  AC  j  e  A,  is  flagged  as  a  target,  and  when  intercepted  is 
identified  as  a  neutral,  and  (v)  an  object  is  detected  at  ACx,  moves  to  AC  j  ^  A,  is 
flagged  as  a  target,  and  when  intercepted,  it  is  confirmed  as  a  target. 
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Based  on  these  five  possible  seenarios,  we  ean  explieitly  write  the  possible 


values  w  =  )'  ean  take; 


w 

V  y 


{r,>i,x,f,o)', 


if  no  object  is  detected  in  AC  x 

if  an  object  is  detected  in  AC  x,  then  moves  to 

if  an  object  is  detected  in  AC  x,  moves  to  AC  j  &  A  and 
is  flagged  as  a  likely  neutral 

if  an  object  is  detected  in  AC  x,  moves  to  AC  j  e  A,  flagged 
as  a  likely  target,  intercepted  but  turns  out  to  be  a  neutral 
if  an  object  is  detected  in  AC  x,  moves  to  AC  j  e  A,  flagged 
as  a  likely  target,  intercepted  and  is  indeed  a  target 


(3.23) 


To  derive  the  probability  mass  funetion  of  w  we  need  to  derive  the  probability  of 
eaeh  of  the  above  seenarios.  To  do  that,  we  first  define  two  new  random  variables: 


/  = 


-1, 
<  0, 

,  j, 
-1, 
<  0, 

,  1, 


if  no  object  is  detected  in  AC  x 

if  an  object  is  detected  in  AC  x  and  while  being  tracked  it  moves  to  Ag  (3-24) 
if  an  object  is  detected  in  AC  x  and  while  being  tracked  it  moves  to  AC  j  &  A 
if  no  object  is  detected  in  AC  x  (and  so  nothing  is  flagged) 

if  an  object  is  detected  in  AC  x  and  after  tracking  it  is  not  flagged  as  a  likely  target  (3.25) 
if  an  object  is  detected  in  AC  x  and  after  tracking  it  is  flagged  as  a  likely  target 


Note  that /  =  0  can  either  imply  that  the  tracked  object  is  flagged  as  a  likely 


neutral  or  that  it  has  left  the  AOI  and  so  it  is  not  flagged  at  all.  We  also  recall  (see 
Chapter  III.  A.  1. a)  that  denotes  the  event  of  AC  x  containing  a  target  at  the  time  the 


Recognizer  had  reached  AC  x ,  and  that  denotes  the  event  of  AC  x  containing  a 
neutral  at  the  time  the  Recognizer  reaches  AC  x  .  Rewriting  (3.23)  we  get; 


/  =  -l,  r/--l 

(/) 

(M  ) 

vv 
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II 
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II 
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y/  e  A  (in) 

y  J 

y/  e  A  (iv) 

y/eA  (v) 

(3.26) 
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Next  we  eompute  the  probability  of  eaeh  of  the  event  on  the  right-hand  side  of 

(3.26). 

The  first  event(/)  is  the  simplest  to  compute  (note  that /  =  -1  <=>  =  -1 ); 

Pr  {/  =  -1,  J  =  -1}  =  1  - (r,^, ;r)  -  6,  (r,^, 0)  (3.27) 

where  s  =  {t,r,  i,  n,  6*) '  is  the  current  state. 

The  second  event  (//)  is  also  relatively  simple  to  compute  (note 
thattf  =  0  ^  /  =  0): 

Pr{/  =  =  0}  =  Pr{«f  =  0}  =  +  (3.28) 

To  compute  the  probabilities  of  the  other  three  events  in  (3.26),  we  first  recall  that 
denotes  the  posterior  probability,  at  the  end  of  the  tracking  phase,  that  the  tracked 
object  is  a  target.  Also  recall  that  a  tracked  object  is  flagged  as  a  target  if  >  M, 
where  M  is  the  flagging  threshold. 

We  defer  the  computation  of  event  {III)  after  we  compute  events  (/F)  and  (V). 

The  following  derivation  is  true  for  all  j  &  A: 

Pr{/  =  l,J  =  7,Aj 
=  Pr{j  =  7,AjPr{/  =  l|J  =  7,Aj 

^?x{d^j,N^]Y^Vx{e^^^>M\d^j,N^,n^n’}Vx{n^n’\d^j,N:^^ 

n'=Q 

where  gis  the  total  number  of  glimpses  the  Recognizer  takes  while  tracking  an  object, 

and  n  <  gis  the  number  of  glimpses  which  indicated  the  tracked  object  being  a  neutral. 

Note  that  for  every  y  e  A ,  we  can  calculate  the  maximal  value  of  n  for  which  >  M  . 

Let  n.  denote  this  value  ofn  .  This  means  that  for  every  j  &A: 

,  .  fl,  if«'<n* 

Prl^’'"'^  >M|£/  =  A«  =  n'|=  '  (3.30) 

^  ’  |0,  ifn'>n* 
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Note  that  the  value  of  n*  depends  on  the  speeifie  AC  j  to  whieh  the  traeked  objeet 

moves,  beeause  takes  into  aeeount  movement  reeognition  (i.e.,  an  objeet  moving  to 
AC  7  may  result  in  flagging  it  as  a  neutral  while  the  same  objeet  moving  to  a  different 
AC  j'  may  result  in  flagging  it  as  a  target,  even  if  in  both  oases  the  same  number  of 
glimpses  report  “neutral”). 


Continuing  our  derivation  from  (3.29)  we  get: 

Pr{(i  =  7,  >M\d^j,N^,n^  «jPr{«  =  \d^j,N^}^ 

(3.31) 

=  Pr  {J  =  7,  Aj  ^  Pr  =  n' I  =  7,  Aj 

n'=0 

Using  Bayes’  formula  for  the  first  multiplioative  term  on  the  right-hand-side  of 
(3.31)  we  get: 

« .  n. 

Fr{d^j,N^}'^?r{n^n'\d^j,N^}^Fr{d^j\N^}Fr{N^}j]?r{n^n'\d^j,N^}  (3.32) 
«'=0  «=0 

The  right-hand-side  of  (3.32)  oan  be  explioitly  expressed  in  the  following  way: 

Pr{J  =  7  I  AjPrjAjXP’-l"  =n'\d=J,N,}  =  P{x,j)^^  X(fe]p”'  (l 

A=0  h'=0^^  ' 

Summarizing  (3.29)  -  (3.33)  we  get  that  for  every  j  e  A: 


Pr  {/  -  1,  (/  =  7,  A  J  -  A  (x,  j)  E  ((")  v"  (l  -  v)* 


(3.34) 


Following  a  very  similar  derivation  we  get  the  probability  of  event  (F ) : 

pr{/=w  =  _,,r7  =  e(i-,y)0,(7:y0)t(('](i-«)"'(»rj  o-JS) 

n'=0 

The  derivation  of  the  probability  of  event  (///)  is  similar  to(/F)  and(F ) ,  with  two 

key  differenoes:  We  need  to  oonsider  both  and  A^  events,  and  we  are  interested  in  the 

eases  where  the  number  of  glimpses  reporting  “neutral”  is  high  enough  to  result  in 
flagging  the  traeked  objeet  as  a  neutral.  This  means  that  in  the  summation  over  the 
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possible  number  of  glimpses  reporting  “neutral”  we  sum  from  n*  + 1  to  g  .  We  get  that 
for  every  j  ^A\ 

Vx[f  =  Q,d  =  j]  = 


(3.36) 


Combining  (3.26)  with  the  probabilities  calculated  in  (3.27),  (3.28),  (3.34),  (3.35) 
and  (3.36)  we  can  completely  present  the  probability  mass  function  ofw : 


^  rp  R  '' 

^  r,x 
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i... 
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:  P{x,A)n[Tl,nYQ{x,A)e{T^^,,e) 
+P{x,x)n^[T^^,n)  ^  0 “^r" ) 

n  =n,  +1 

+Q{x,x)d,(T:^^,d)  ±  (y(l-«)”'«-'') 


(3.37) 


(3.38) 
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(3.39) 
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[(At  ) 
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lo  1 

=  p  (x,  J)  Z  ((")  (1  -  ) 


yj^A 


(3.40) 
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Pr^ 


r 

w 

L 


v^w  y 


=  Q(xj)i(T‘,.e)t{\;](\-u)'\ur' 


(3.41) 


Vye^ 

while  the  probability  that  w  will  get  any  other  value  than  those  which  appear  above  is  0. 


4.  Bellman  Equation  Calculation 


Finding  the  optimal  decision  policy  for  the  Original  Problem  5  requires 
calculating  the  value  V (s)  of  all  states  s  =  {t,r,i,n,6y  using  Bellman  equation  (2.5). 

The  trivial  case  is  whent  >  T ,  which  results  in  ^(5)  =  0  .  Computing  the  value  of  a  state 


when  t  <T  requires  calculating  the  maximal  expected  sum  of  the  reward  obtained 
between  the  time  the  current  decision  is  made  and  the  time  that  decision  is  fathomed,  and 
the  value  of  the  next  state.  Calculating  this  maximal  value  is  attained  by  total 
enumeration  of  all  feasible  decisions.  In  the  following  section,  we  discuss  the  calculation 
of  this  expected  sum  of  reward  and  value  of  the  next  state.  Recall  that  the  expected  value 
of  a  sum  is  always  equal  to  the  sum  of  the  expected  values: 


F(5)  =  max|£'  c(w,5)  +  (5,x, =  max|£'[c(w,5)]  +  £■  (5,x, |  (3.42) 


:  max( 

i(j) 


where: 


E 


£'[c(h’,5)]=  ^  c{^w',s)Pt[w  =  w'] 

w'eW 

V (^5*^  (5,x,  “  X!  ^ (5,x,  w)^Pr|w  =  W] 

w'eW 


(3.43) 

(3.44) 


Recall  that  c{w,s^  is  actually  only  a  function  of  the  At^and  components  of 


the  information  random  vector  w  (2.3),  and  so  for  the  calculation  of  the  first  expected 
value  we  only  need  the  joint  probability  distribution  of  these  two  random  variables,  and 
not  the  complete  joint  probability  distribution  of  all  four  components  of  w.  Similarly, 
[s,x,w)  is  actually  a  function  of  just  At^ ,  and  components  ofw  (3.16),  and  so 
for  the  calculation  of  the  second  expected  value  we  only  need  the  joint  probability 
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distribution  of  these  three  random  variables  and  not  the  eomplete  joint  probability 
distribution  of  all  four  eomponents  of  w .  We  define  two  new  funetions  whieh  represent 
these  dependeneies: 

c(^f„,z^,s)  =  c(w,s)  VseS,Vw  =  (At^,r^,i^,zJ'e}V  (3.45) 

5"  [s,x,At^,r^,Q  =  s^  {s,x,w)  V5e5,Vxe^,Vw  =  (At„,r^,z„,z^)'e  W(3.46) 


Using  this  funetions  we  ean  now  rewrite  the  expeeted  values  from  (3.43)  and 


(3.47) 

(3.48) 


(3.44): 

1  CO 

£[c(w,s)]  ^e[£{M^,z^,s)]  =  X  Z  =  4} 

z^=Q)  A4=0 

E\vi^s’'  (5,x,w))]  = 

AzC=0  rl&A 

The  joint  probabilities  appearing  in  (3.47)  and  (3.48)  are  marginal  probabilities  of 
the  full  probability  mass  funetion  of  w  =  ( At^  • 

PrjAt  =At',z  =z'|=y  yPrjAt  =At',r  =r',i  =i  ,z  =z'|  (3.49) 

r^eA  i'^,eA 
1 

PrjAt  =At',r  =r',i  =z'|  =  yPr|At  =  At' ,r  =r',i  =i' ,z  =z'|  (3.50) 

^  w  w  w  w )  (  w  w  w  w  w)  V  / 

Using  (3.50)  and  the  probability  mass  funetion  of  win  (3.37)-(3.41)  we  ean 
explieitly  present  the  expeeted  value  in  (3.48): 
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F(/'(s,x,w))]=(/)(//)+2;((///)(/r))+2;(('')("))+(™)("") 

j€.A  j€.A 

where: 

(/)  =  v{^s^ 

«'=ovv”  y  7 

(F)  =  F(5'^{s,x,r,>l,7,f)) 

«'=«•+!  U”  7  V”  7 

(F//)  =  F(s"{s,x,r,>l,x,f)) 


(3.51) 


Similarly,  we  can  use  (3.49)  and  the  probability  mass  function  ofwto  explicitly 
present  the  expect  value  in  (3.47)  (recall  that  the  reward  is  non-zero  only  whenz^  =  1 ): 


£[c(w,s)]  = 

=AK,s)T. 

jeA 


(1  +  y) 


n'=0 


\n'  I  s  g-«' 


(3.52) 


Calculating  the  value  of  the  initial  state  according  to  the  Bellman  equation  (2.5) 
using  our  results  in  (3.51)  and  (3.52),  requires  going  over  all  combinations  of  states, 
decisions  and  possible  information  realizations.  Recall  that  the  state  is  defined  as 

5  =  {t,r,i,7i,6)  .  Though  the  two  probability  vectors  6  and  n  can  theoretically  take  any 
value  in  the  continuous  interval  [0,l] ,  their  values  are  determined  by  the  history  of  the 
Recognizer  location.  The  number  of  different  paths  the  Recognizer  can  take  in  the  time 
horizon  T  is  \A^ ,  and  therefore  the  state  space  size  is  |‘5|  =  r  -1.711 -1.711 -1.711^  =  T  -|.A|^^^. 


Examination  of  the  information  w  Probability  Mass  Function  in  (3.37)  -  (3.41)  shows 

that  for  every  state  s  and  decision  x ,  the  size  of  the  information  space  is 

32 


I  W|  =  3 -1^1  +  2  (by  counting  the  number  of  possible  values  of  w  in  the  explieit  PMF 

formulas).  Combining  all  together,  ealeulating  Bellman  equation  in  the  Original  Problem 
requires  going  over  all  eombinations  of  state,  deeision  and  information  realization,  whieh 

results  in  a  run  time  of  O^T  •(3-|.A|  +  2^j  for  the  baekward  reeursion  dynamie 

programming  algorithm  (see  Powell  [8],  p.  50).  This  shows  why  the  Original  Problem  is 
intraetable  for  realistie  situations  with  more  than  a  few  ACs  and  time  periods.  As  an 
example,  even  a  small  seenario  with  10  ACs  and  time  horizon  T  =  5,  results  in  16  billion 
ealeulations. 

B,  UPPER  BOUND  PROBLEM 

The  Upper  Bound  Problem  B  is  very  similar  to  the  Original  Problem  5;  A 
deeision  is  defined  in  the  same  way  (though  the  two  problems  have  different  optimal 
deeision  polieies).  The  information  in  B  is  defined  in  the  same  way  as  'mB ,  though  its 
probability  distribution  is  different.  The  state  transition  funetions  of  the  two  problems  are 
elosely  related.  The  rewards  in  the  two  problems  are  praetieally  the  same,  exeept  that  we 
formally  use  different  functional  notation  because  they  operate  on  different  state  spaees. 
Lastly,  The  Bellman  equations  of  the  two  problems  appear  to  be  almost  identieal,  but 
they  use  slightly  different  variables  as  presented  earlier  in  (2.9).  In  the  following  seetions, 
we  formally  define  the  Upper  Bound  Problem  B  and  diseuss  the  differenees  with  respeet 
to  the  Original  Problem  5  . 

1.  States 

Reeall  that  we  previously  defined  a  state  in  B  as  T  =  where  Tis  the 

time,  r  is  the  Reeognizer’s  loeation  and  i  is  the  Intereeptor’s  loeation  (2.8).  The  spaee 
of  all  possible  states  is  denoted  S  . 
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2. 


Decision 


A  decisionx  e  ^  is  the  next  AC  to  be  visited  by  the  Reeognizer.  The  decision 
in  5  is  defined  in  the  same  way  as  in  the  Original  Problem  5  . 

3.  Information 


Let  the  random  vector  w  =  (At.r,  i,,z)  denote  the  information  obtained  when 

a  decision  is  fathomed  in  the  Upper  Bound  Problem  5  .  The  definition  of  each  of  these 
four  vector  components  is  exactly  the  same  as  in  the  Original  Problem  5 ,  but  because 
each  of  these  random  variables  has  different  probability  mass  function  than  in  the 
Original  Problem  B ,  we  use  different  names  for  these  random  variables.  The  space  of 
possible  information  values  W  is  the  same  for  5  and  for  5  (though  the  probability  of 
obtaining  each  realization  from  these  possible  values  is  different). 

4,  State  Transition 


The  state  transition  function  in  the  Upper  Bound  Problem  differs  from  that  in  the 
Original  Problem  by  the  fact  that  it  does  not  include  the  decision  x  as  an  argument.  Recall 
that  the  state  transition  function  in  the  Original  Problem  requires  xas  an  argument  in 
order  to  calculate  the  new  state’s  probability  vectors  and  6^  ,  which  are  not  a  part  of 
the  state  definition  in  the  Upper  Bound  Problem,  and  as  such  they  do  not  need  to  be 
calculated.  The  decision  x  is  of  course  a  key  factor  in  determining  the  next  state,  but  it 
does  so  indirectly  by  influencing  the  obtained  information  iv(T,x) : 


:SxW^S 


5"(5,W(5,X)): 


t  +  At, 


V  -w  y 


(3.53) 

(3.54) 


where  5  =  (t  ,r , / )'  is  the  state  and  w  =  (At^, 


i  ,z 

w  ’  VI 


is  the  obtained  information. 
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5. 


Reward 


The  reward  c  is  a  funetion  of  the  information  w  and  the  state  s  ,  and  is  defined  as 

follow: 


c(w,s) 


c:WxS- 


0, 


^  +At„<r 

^  >  T 


(3.55) 

(3.56) 


where  w  =  ,  s  =  y  ,r  ,iy  and  T  is  the  time  horizon  used  in  the  scenario . 

This  definition  is  practically  the  same  as  in  the  Original  Problem  5,  just  using  the 
appropriate  Upper  Bound  Problem’s  variables. 


6.  Bellman  Equation 


We  define  the  value  of  being  in  state  s  as  the  maximum  expected  cumulative 
reward  that  can  be  obtained  from  the  time  of  being  in  state  J  to  the  time  horizon  T .  This 
value  U  (t)  is  given  by  Bellman  equation  as  presented  earlier  (2.9): 


y(n= 

where  T  is  the  time  of  state  T  =  ,r ,  / . 


maxlE' 

x(?)  1 

c{w,s)  +  v(^s^  {s,w))  1 

,0’ 

t  <T 
T>T 


(3.57) 


7.  Bellman  Equation  Calculation 


The  formulas  used  for  the  calculation  of  a  state’s  value  in  the  Upper  Bound 
Problem  5  are  similar  to  those  in  the  Original  Problem  5  .  We  use  the  same  formulas  for 
the  two  expected  value  terms  in  the  Original  Problem  Bellman  equation,  appearing  in 
(3.51)  and  in  (3.52),  when  we  replace  the  current  state’s  0  and  n  by  the  steady-state 
6^  and  ;r° ,  with  the  probability  components  for  both  a  target  and  a  neutral  at  the  current 
Recognizer’s  location  7  set  to  0.  Let  6*°  and  denote  these  updated  probability  vectors: 


0^ 


a^r 
a  =  7 


(3.58) 
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0, 


a^r 
a  =  ¥ 


(3.59) 


Substituting  6  and  n  with  (3.58)  and  (3.59)  in  (3.51)  and  (3.52),  while  explicitly 
computing  the  next  state  using  the  state  transition  function  in  (3.54),  we  get  the  following 
formulas  for  computing  the  Bellman  equation  in  the  Upper  Bound  Problem: 


[III)  =  =  (t  +rA  +1 + 


e[v(j-  ={t  +AC,r:,C))]  =  (/)(//)+X(WM+Z((^)(^^))+ 

j&A  jsA 

where: 

[l)^v[s^  ^(T  +  Tl^,x,T)) 

JJ)) 

n’=o''^  ^  ^ 

(r)  =  F(s'”=(F+r,*+i,y,r)) 

^«'=«y+l  ' 

[Vll)^v[s^(T  +  T-^-+\,xj)) 


(/r)  = 


h.60) 


(3.61) 


As  in  the  Original  Problem,  computing  the  value  of  the  initial  state  requires  going 
over  all  combinations  of  state,  decision  and  information  realization.  The  main  difference 
from  the  corresponding  calculation  in  the  Original  Problem  is  the  much  smaller  size  of 

the  state  space  |tS|  =  T  •  \  Af ,  due  to  the  fact  that  the  probability  vectors  n  and  0  are  not  a 
part  of  the  Upper  Bound  Problem  state  J  =  [t  ,r,i .  This  means  that  the  run  time  of 
calculating  the  value  of  the  initial  state  in  the  Upper  Bound  Problem  is 
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o{t  'I^I^  •(3-|^|  +  2^j,  which  is  much  faster  than  the  Original  Problem  run  time,  whieh 
is  o{t  ■  •  (3  •  1^1  +  2)  j  (as  diseussed  in  Chapter  III.A.4). 

When  there  is  a  small  number  of  possible  transitions  from  each  AC  in  the  AOI 
(i.e.,  onee  in  a  speeifie  AC,  an  objeet  ean  only  move  to  a  small  number  of  neighboring 
ACs),  there  is  a  way  to  improve  the  run  time  of  this  algorithm.  This  is  aeeomplished  by 
ealeulating  the  possible  realizations  of  the  information  w  only  for  those  ACs  whieh  have 
a  non-zero  probability  the  objeet  has  moved  to.  Let  the  set  of  ACs  reaehable  from  AC  a 
in  a  single  time  step  transition  be  denoted  as  the  forward  star  of  AC  a  (for  all  ACs 
a  e  A ).  Let  //  denote  the  maximal  of  all  ACs’  forward  star  sizes.  The  results  run  time  of 

o{t  ■  \Af  •  (3  •  //  +  2) j  is  indeed  an  improvement  when  //  <  |A| . 

C.  REWARD  CALCULATION  USING  HEURISTIC’S  POLICY 

Theoretieally,  ealeulating  the  reward  eolleeted  following  the  heuristie  deeision 
poliey  eould  be  done  in  a  similar  way  to  the  baekward  solution  algorithm  for  solving  the 
Bellman  equation  in  the  Original  Problem  diseussed  in  Chapter  IILA.4,  using  the 
formulas  for  the  expeeted  values  terms  in  (3.51)  and  (3.52).  This  ealeulation  is  simpler 
than  ealeulating  the  eolleeted  reward  following  the  optimal  deeision  poliey,  beeause  we 
do  not  need  to  eompare  the  possible  future  rewards  given  every  possible  deeision  at  eaeh 
state,  but  only  the  future  state  resulting  from  making  a  deeision  based  on  our  heuristie. 
Nevertheless,  this  simpler  ealeulation  is  still  intraetable  as  the  state  spaee  is  still  very 
large  (we  still  need  to  solve  for  eaeh  state  in  the  Original  Problem  state  spaee  5 ). 

The  method  used  in  this  thesis  is  to  estimate  the  value  of  eaeh  state  following  the 
heuristie  deeision  poliey  using  a  Monte-Carlo  simulation  instead  of  an  exaet  ealeulation 
as  diseussed  in  the  above  paragraph.  Instead  of  direetly  ealeulating  the  expeeted  value  in 
the  Bellman  equation,  we  simulate  the  seenario  while  randomizing  the  required 
realizations  of  the  obtained  information  w  using  its  known  probability  mass  funetion.  We 
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continue  to  generate  these  single -runs  while  keeping  track  of  the  mean  and  variance  of 
the  collected  rewards  until  the  confidence  interval  for  the  expected  reward  is  sufficiently 
small. 


Figures  5  through  7  demonstrate  three  states  taken  from  a  heuristic  calculation 
run.  Each  figure  presents  the  probability  map  for  neutrals  at  each  AC  in  the  AOI  (tt)  and 
the  probability  map  for  targets  at  each  AC  in  the  AOI  (0),  together  with  the  decision  x , 
the  Recognizer  location  r  at  the  time  of  the  decision,  and  the  Recognizer  location  at 

the  time  the  decision  has  been  fathomed.  Figure  5  shows  the  state  at  time  t  =  0 ,  which 
represents  the  steady-state.  The  Recognizer  initial  location  is  AC  #18,  it  decides  to  visit 
AC  #13  which  turns  out  to  be  empty,  and  so  the  Recognizer  eventually  ends  up  in  AC 
#13.  Figure  6  shows  the  state  at  time  t  =  \,  at  which  the  Recognizer  location  is  AC  #13 
(where  it  ended  up  in  the  previous  state  transition).  Note  that  both  the  probability  of  a 
target  and  the  probability  of  a  neutral  in  AC  #13  were  set  to  0  (as  discussed  in  Chapter 
III.A.2).  This  time  the  Recognizer  decides  to  visit  AC  #17,  which  also  turns  out  to  be 
empty. 
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A  state  example  ( t  =  0  ) 
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Figure  6.  A  state  example  ( t  =  1 ) 
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Figure  7.  A  state  example  ( t  =  1 8 ) 


Figure  7  shows  the  state  at  time  t  =  18,  at  whieh  the  Recognizer  location  is  AC 

#9.  Note  that  the  overall  probabilities  of  targets  and  neutrals  in  each  AC  in  the  AOI  seem 

to  be  much  lower  than  in  the  beginning  of  the  run  (i.e.,  in  steady-state).  This  is  due  to  the 

fact  that  many  ACs  has  been  visited  and  the  corresponding  probability  components  of  n 

and  0  were  set  to  0  after  each  visit,  and  as  time  advanced  low  probabilities  “propagated” 
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to  the  rest  of  the  AOI  according  to  the  neutrals  and  targets  Markov  chains.  The 
Recognizer  has  decided  to  visit  AC  #7,  in  which  an  object  was  detected  and  so  it  was 
tracked  while  moving  to  AC  #6  (from  this  figure  we  cannot  tell  whether  the  object  was 
flagged  as  a  likely  target  or  not,  nor  if  it  was  eventually  intercepted). 

Figure  8  demonstrates  the  result  of  a  heuristic  calculation  run  with  time  horizon 
t  =  48 .  In  the  bottom  part  of  the  figure  we  can  identify  seven  events  of  object  detection, 
out  of  which  three  have  not  resulted  in  flagging  the  tracked  object  as  a  likely  target,  and 
so  the  Interceptor  has  not  been  called  for  interception  (t  =  0,13,17  ),  while  the  remaining 
four  detection  events  resulted  in  the  tracked  object  being  flagged  as  a  likely  target  and  so 
the  Interceptor  has  been  called  in  for  interception  (t  =  5,24,31,36  ).  Out  of  these  four 
interception  attempts,  we  can  identify  two  successful  targets  interception  which  resulted 
in  collecting  rewards  at  times  t  =  9,33  (the  delay  is  due  to  the  time  it  takes  the 
Interception  to  reach  the  flagged  object  and  intercept  it).  The  two  remaining  intercepted 
objects  were  in  fact  neutrals. 


Figure  8.  A  heuristic  run  summary  example  with  two  intercepted  targets 

(7  =  48) 
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IV.  NUMERICAL  CASE  STUDY 


A,  OVERVIEW 

In  this  chapter,  we  present  a  numerieal  ease  study  that  addresses  the 
implementation  of  the  different  models,  the  specifie  seenarios,  the  data  used,  the  results 
of  the  different  runs,  and  the  insights  resulting  from  this  analysis.  The  main  seenario 
ehosen  to  be  analyzed  is  of  a  25  ACs  strait-like  AOI,  with  slightly  different  movement 
patterns  for  neutrals  and  targets,  and  with  a  time  horizon  of  48  time  steps  (representing  12 
real-life  hours).  All  seenarios  were  implemented  and  analyzed  using  MATLAB.  All 
together,  we  have  run  seven  different  seenarios,  ineluding  47  Upper  Bound  Problem 
expeeted  value  ealeulations,  43  heuristie’s  expeeted  value  estimations,  and  over  50 
MATLAB  run  hours. 

B,  SPECIFIC  SCENARIOS  AND  DATA 

The  numerieal  ease  study  presented  in  this  ehapter  ineludes  a  baseline  seenario, 
and  several  variations  of  this  baseline  seenario  for  sensitivity  analysis  and  evaluation  of 
the  robustness  of  the  baseline  seenario ’s  results  and  insights. 

1.  Baseline  Scenario 

The  baseline  seenario  represents  a  strait-like  AOI,  with  land  on  the  North  and 
South  edges  of  the  AOI  (i.e.,  no  arrivals  from  or  departures  to  the  North  and  South  of  the 
AOI).  The  AOI  A  is  a  5-by-5  square  grid  of  25  total  ACs  (see  Figure  9).  Eaeh  AC 
represents  a  5-by-5  nautieal-miles  area  in  real-life,  with  a  total  are  of  625nm  .  The 
boundary  of  the  AOI,  E ,  is  the  five  ACs  on  the  West  edge  of  the  AOI  (ACs  #1-5)  and  the 
five  ACs  on  the  East  edge  of  the  AOI  (ACs  #6-10)  .  AC  #26  represents  the  area 
outside  of  the  AOI.  The  single  time-step  probability  of  a  neutral  and  the  single  time-step 
probability  of  a  target  arriving  to  eaeh  of  the  ACs  in  the  boundary  is  0.05  and  0.01, 
respeetively.  The  Markov  ehains  representing  neutrals  movement  and  targets  movements 
are  slightly  different  to  represent  an  operational  situation  in  whieh  one  has  some 
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intelligent  or  other  prior  knowledge  about  the  differences  in  the  expected  movement  of 
the  two  objects’  types.  In  a  single  time  step,  both  types  of  objects  can  only  move  to  the 
four  immediate  neighboring  ACs,  with  no  diagonal  movement.  This  results  in  having  a 
maximal  forward  star  size  ju  =  5 .  Generally  speaking,  we  assume  neutrals  tend  to  move 
across  the  strait  (West-East  traffic),  while  targets  tend  to  move  perpendicular  to  the 
shipping  lanes  (North-South  traffic).  This  distinction  is  not  absolute,  meaning  that  both  a 
neutral  object  and  a  target  can  move  to  the  exact  same  neighboring  ACs,  just  with 
different  probabilities  (i.e.,  there  is  no  feasible  movement  which  is  unique  to  either 
targets  or  neutrals). 


◄ -  25nm  - ► 


Land 

5 

15 

20 

25 

10 

4 

14 

19 

24 

9 

3 

13 

18 

23 

8 

2 

12 

17 

22 

7 

1 

11 

16 

21 

6 

Land 

Figure  9.  The  baseline  scenario  AOI 

For  neutrals,  the  probability  of  moving  on  the  West-East  axis  is  always  double 
than  the  probability  of  moving  on  the  North-South  axis.  For  targets,  the  situation  is 
flipped,  with  double  the  probability  of  moving  on  the  North-South  axis  than  on  the  West- 
East  axis.  For  each  of  the  ACs  in  A  ,  there  is  a  0.1  single  time  step  probability  of  staying 
in  the  same  AC.  The  probability  of  moving  back  from  A^  to  any  AC  in  A  is  0,  and  an 

object  currently  in  Aq  will  remain  there  with  probability  \  {A^  is  an  absorbing  state  in 

the  individual  Markov  movement  process  of  each  of  the  objects).  Figure  10  shows 
examples  of  single  time  step  transition  probabilities  for  targets  and  for  neutrals,  in  a 
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geographical  manner,  while  Table  1  and  Table  2  present  the  complete  Markov  transition 
probability  matrices  for  both  neutrals  and  targets. 


Figure  10.  Partial  Markov  transition  prob.  of  neutrals  (right)  and  targets  (left) 


Both  the  Recognizer  and  the  Interceptor  in  this  baseline  scenario  start  in  the 
middle  of  the  AOI  in  AC  #18.  A  single  time  step  represents  15  minute  in  real-life.  The 
operational  time  horizon  used  is  12  hours,  and  so  we  have  a  time  horizon  of  48  time 
steps.  We  assume  the  Interceptor  has  roughly  the  same  velocity  as  the  both  neutrals  and 
targets,  which  is  one  AC  per  time  step  (approximately  20  knots  in  real-life).  The 
Recognizer  velocity  is  assumed  to  be  four  times  the  velocity  of  the  Interceptor  and  of  the 
objects  (approximately  80  knots  in  real-life).  The  Recognizer’s  and  Interceptor’s 
traveling  times  between  each  pair  of  ACs  is  calculated  as  following: 


R 


velocity 


\a-a 


velocity 


(4.1) 

(4.2) 


where  and  are  the  Recognizer’s  and  Interceptor’s  velocities,  respectively. 


and  |p-«j|  is  the  Euclidean  metric  representing  the  geographical  distance  between  AC 

a  and  AC  a'.  Traveling  times  between  all  pairs  of  ACs  are  presented  in  Table  3  and 
Table  4.  Recognizer  traveling  times  include  detection  time  and  Interceptor  traveling  times 
include  boarding  time. 
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AC# 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

21 

22 

23 

24 

25 

26 

1 

0.1 

0.3 

0 

0 

0 

0 

0 

0 

0 

0 

0.3 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0.3 

2 

0.15 

0.1 

0.15 

0 

0 

0 

0 

0 

0 

0 

0 

0.3 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0.3 

3 

0 

0.15 

0.1 

0.15 

0 

0 

0 

0 

0 

0 

0 

0 

0.3 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0.3 

4 

0 

0 

0.15 

0.1 

0.15 

0 

0 

0 

0 

0 

0 

0 

0 

0.3 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0.3 

5 

0 

0 

0.3 

0 

0.1 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0.3 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0.3 

6 

0 

0 

0 

0 

0 

0.1 

0.3 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0.3 

0 

0 

0 

0 

0.3 

7 

0 

0 

0 

0 

0 

0.15 

0.1 

0.15 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0.3 

0 

0 

0 

0.3 

8 

0 

0 

0 

0 

0 

0 

0.15 

0.1 

0.15 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0.3 

0 

0 

0.3 

9 

0 

0 

0 

0 

0 

0 

0 

0.15 

0.1 

0.15 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0.3 

0 

0.3 

10 

0 

0 

0 

0 

0 

0 

0 

0 

0.3 

0.1 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0.3 

0.3 

11 

0.3 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0.1 

0.3 

0 

0 

0 

0.3 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

12 

0 

0.3 

0 

0 

0 

0 

0 

0 

0 

0 

0.15 

0.1 

0.15 

0 

0 

0 

0.3 

0 

0 

0 

0 

0 

0 

0 

0 

0 

13 

0 

0 

0.3 

0 

0 

0 

0 

0 

0 

0 

0 

0.15 

0.1 

0.15 

0 

0 

0 

0.3 

0 

0 

0 

0 

0 

0 

0 

0 

14 

0 

0 

0 

0.3 

0 

0 

0 

0 

0 

0 

0 

0 

0.15 

0.1 

0.15 

0 

0 

0 

0.3 

0 

0 

0 

0 

0 

0 

0 

15 

0 

0 

0 

0 

0.3 

0 

0 

0 

0 

0 

0 

0 

0 

0.3 

0.1 

0 

0 

0 

0 

0.3 

0 

0 

0 

0 

0 

0 

16 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0.3 

0 

0 

0 

0 

0.1 

0.3 

0 

0 

0 

0.3 

0 

0 

0 

0 

0 

17 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0.3 

0 

0 

0 

0.15 

0.1 

0.15 

0 

0 

0 

0.3 

0 

0 

0 

0 

18 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0.3 

0 

0 

0 

0.15 

0.1 

0.15 

0 

0 

0 

0.3 

0 

0 

0 

19 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0.3 

0 

0 

0 

0.15 

0.1 

0.15 

0 

0 

0 

0.3 

0 

0 

20 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0.3 

0 

0 

0 

0.3 

0.1 

0 

0 

0 

0 

0.3 

0 

21 

0 

0 

0 

0 

0 

0.3 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0.3 

0 

0 

0 

0 

0.1 

0.3 

0 

0 

0 

0 

22 

0 

0 

0 

0 

0 

0 

0.3 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0.3 

0 

0 

0 

0.15 

0.1 

0.15 

0 

0 

0 

23 

0 

0 

0 

0 

0 

0 

0 

0.3 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0.3 

0 

0 

0 

0.15 

0.1 

0.15 

0 

0 

24 

0 

0 

0 

0 

0 

0 

0 

0 

0.3 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0.3 

0 

0 

0 

0.15 

0.1 

0.15 

0 

25 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0.3 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0.3 

0 

0 

0 

0.3 

0.1 

0 

26 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 

Table  1 .  Neutrals  Markov  transition  probabilities  matrix  P 


AC# 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

21 

22 

23 

24 

25 

26 

1 

0.1 

0.6 

0 

0 

0 

0 

0 

0 

0 

0 

0.15 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0.15 

2 

0.3 

0.1 

0.3 

0 

0 

0 

0 

0 

0 

0 

0 

0.15 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0.15 

3 

0 

0.3 

0.1 

0.3 

0 

0 

0 

0 

0 

0 

0 

0 

0.15 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0.15 

4 

0 

0 

0.3 

0.1 

0.3 

0 

0 

0 

0 

0 

0 

0 

0 

0.15 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0.15 

5 

0 

0 

0.6 

0 

0.1 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0.15 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0.15 

6 

0 

0 

0 

0 

0 

0.1 

0.6 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0.15 

0 

0 

0 

0 

0.15 

7 

0 

0 

0 

0 

0 

0.3 

0.1 

0.3 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0.15 

0 

0 

0 

0.15 

8 

0 

0 

0 

0 

0 

0 

0.3 

0.1 

0.3 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0.15 

0 

0 

0.15 

9 

0 

0 

0 

0 

0 

0 

0 

0.3 

0.1 

0.3 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0.15 

0 

0.15 

10 

0 

0 

0 

0 

0 

0 

0 

0 

0.6 

0.1 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0.15 

0.15 

11 

0.15 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0.1 

0.6 

0 

0 

0 

0.15 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

12 

0 

0.15 

0 

0 

0 

0 

0 

0 

0 

0 

0.3 

0.1 

0.3 

0 

0 

0 

0.15 

0 

0 

0 

0 

0 

0 

0 

0 

0 

13 

0 

0 

0.15 

0 

0 

0 

0 

0 

0 

0 

0 

0.3 

0.1 

0.3 

0 

0 

0 

0.15 

0 

0 

0 

0 

0 

0 

0 

0 

14 

0 

0 

0 

0.15 

0 

0 

0 

0 

0 

0 

0 

0 

0.3 

0.1 

0.3 

0 

0 

0 

0.15 

0 

0 

0 

0 

0 

0 

0 

15 

0 

0 

0 

0 

0.15 

0 

0 

0 

0 

0 

0 

0 

0 

0.6 

0.1 

0 

0 

0 

0 

0.15 

0 

0 

0 

0 

0 

0 

16 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0.15 

0 

0 

0 

0 

0.1 

0.6 

0 

0 

0 

0.15 

0 

0 

0 

0 

0 

17 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0.15 

0 

0 

0 

0.3 

0.1 

0.3 

0 

0 

0 

0.15 

0 

0 

0 

0 

18 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0.15 

0 

0 

0 

0.3 

0.1 

0.3 

0 

0 

0 

0.15 

0 

0 

0 

19 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0.15 

0 

0 

0 

0.3 

0.1 

0.3 

0 

0 

0 

0.15 

0 

0 

20 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0.15 

0 

0 

0 

0.6 

0.1 

0 

0 

0 

0 

0.15 

0 

21 

0 

0 

0 

0 

0 

0.15 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0.15 

0 

0 

0 

0 

0.1 

0.6 

0 

0 

0 

0 

22 

0 

0 

0 

0 

0 

0 

0.15 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0.15 

0 

0 

0 

0.3 

0.1 

0.3 

0 

0 

0 

23 

0 

0 

0 

0 

0 

0 

0 

0.15 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0.15 

0 

0 

0 

0.3 

0.1 

0.3 

0 

0 

24 

0 

0 

0 

0 

0 

0 

0 

0 

0.15 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0.15 

0 

0 

0 

0.3 

0.1 

0.3 

0 

25 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0.15 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0.15 

0 

0 

0 

0.6 

0.1 

0 

26 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 
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Table  2 


AC# 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

21 

22 

23 

24 

25 

1 

1 

1 

1 

1 

1 

1 

2 

2 

2 

2 

1 

1 

1 

1 

2 

1 

1 

1 

1 

2 

1 

1 

1 

2 

2 

2 

1 

1 

1 

1 

1 

2 

1 

2 

2 

2 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

2 

3 

1 

1 

1 

1 

1 

2 

2 

1 

2 

2 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

4 

1 

1 

1 

1 

1 

2 

2 

2 

1 

2 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

2 

1 

1 

1 

1 

5 

1 

1 

1 

1 

1 

2 

2 

2 

2 

1 

2 

1 

1 

1 

1 

2 

1 

1 

1 

1 

2 

2 

1 

1 

1 

6 

1 

2 

2 

2 

2 

1 

1 

1 

1 

1 

1 

1 

1 

2 

2 

1 

1 

1 

1 

2 

1 

1 

1 

1 

2 

7 

2 

1 

2 

2 

2 

1 

1 

1 

1 

1 

1 

1 

1 

1 

2 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

8 

2 

2 

1 

2 

2 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

9 

2 

2 

2 

1 

2 

1 

1 

1 

1 

1 

2 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

10 

2 

2 

2 

2 

1 

1 

1 

1 

1 

1 

2 

2 

1 

1 

1 

2 

1 

1 

1 

1 

2 

1 

1 

1 

1 

11 

1 

1 

1 

1 

2 

1 

1 

1 

2 

2 

1 

1 

1 

1 

1 

1 

1 

1 

1 

2 

1 

1 

1 

1 

2 

12 

1 

1 

1 

1 

1 

1 

1 

1 

1 

2 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

13 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

14 

1 

1 

1 

1 

1 

2 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

15 

2 

1 

1 

1 

1 

2 

2 

1 

1 

1 

1 

1 

1 

1 

1 

2 

1 

1 

1 

1 

2 

1 

1 

1 

1 

16 

1 

1 

1 

1 

2 

1 

1 

1 

1 

2 

1 

1 

1 

1 

2 

1 

1 

1 

1 

1 

1 

1 

1 

1 

2 

17 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

18 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

19 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

20 

2 

1 

1 

1 

1 

2 

1 

1 

1 

1 

2 

1 

1 

1 

1 

1 

1 

1 

1 

1 

2 

1 

1 

1 

1 

21 

1 

1 

1 

2 

2 

1 

1 

1 

1 

2 

1 

1 

1 

1 

2 

1 

1 

1 

1 

2 

1 

1 

1 

1 

1 

22 

1 

1 

1 

1 

2 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

23 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

24 

2 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

25 

2 

2 

1 

1 

1 

2 

1 

1 

1 

1 

2 

1 

1 

1 

1 

2 

1 

1 

1 

1 

1 

1 

1 

1 

1 

Table  3.  Recognizer’s  all  AC  pairs  transition  times 


AC# 

1 

2 

3 

4 

5 

6 

7 

8 

9 

iO 

IJ 

12 

13 

14 

15 

16 

17 

16 

19 

20 

21 

22 

23 

24 

25 

1 

1 

1 

2 

3 

4 

4 

5 

5 

5 

6 

1 

2 

3 

4 

5 

2 

3 

3 

4 

5 

3 

4 

4 

5 

5 

2 

1 

1 

1 

2 

3 

5 

4 

5 

5 

5 

2 

1 

2 

3 

4 

3 

2 

3 

3 

4 

4 

3 

4 

4 

5 

3 

2 

1 

1 

1 

2 

5 

5 

4 

5 

5 

3 

2 

1 

2 

3 

3 

3 

2 

3 

3 

4 

4 

3 

4 

4 

4 

3 

2 

1 

1 

1 

5 

5 

5 

4 

5 

4 

3 

2 

1 

2 

4 

3 

3 

2 

3 

5 

4 

4 

3 

4 

5 

4 

3 

2 

1 

1 

6 

5 

5 

5 

4 

5 

4 

3 

2 

1 

5 

4 

3 

3 

2 

5 

5 

4 

4 

3 

6 

4 

5 

5 

5 

6 

1 

1 

2 

3 

4 

3 

4 

4 

5 

5 

2 

3 

3 

4 

5 

1 

2 

3 

4 

5 

7 

5 

4 

5 

5 

5 

1 

1 

1 

2 

3 

4 

3 

4 

4 

5 

3 

2 

3 

3 

4 

2 

1 

2 

3 

4 

8 

5 

5 

4 

5 

5 

2 

1 

1 

1 

2 

4 

4 

3 

4 

4 

3 

3 

2 

3 

3 

3 

2 

1 

2 

3 

9 

5 

5 

5 

4 

5 

3 

2 

1 

1 

1 

5 

4 

4 

3 

4 

4 

3 

3 

2 

3 

4 

3 

2 

1 

2 

10 

6 

5 

5 

5 

4 

4 

3 

2 

1 

1 

5 

5 

4 

4 

3 

5 

4 

3 

3 

2 

5 

4 

3 

2 

1 

11 

1 

2 

3 

4 

5 

3 

4 

4 

5 

5 

1 

1 

2 

3 

4 

1 

2 

3 

4 

5 

2 

3 

3 

4 

5 

12 

2 

1 

2 

3 

4 

4 

3 

4 

4 

5 

1 

1 

1 

2 

3 

2 

1 

2 

3 

4 

3 

2 

3 

3 

4 

13 

3 

2 

1 

2 

3 

4 

4 

3 

4 

4 

2 

1 

1 

1 

2 

3 

2 

1 

2 

3 

3 

3 

2 

3 

3 

14 

4 

3 

2 

1 

2 

5 

4 

4 

3 

4 

3 
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3 
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1 
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Table  4.  Interceptor’s  all  AC  pairs  transition  times 
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The  Recognizer’s  sensor  is  assumed  to  take  three  glimpses  at  the  tracked  object 
during  the  single  time  step  tracking  phase  (g  =  3).  The  false  positive  and  false  negative 
detection  probabilities  of  a  target  are  both  0.2  ( u  =  v  =  0.8 ). 

The  discount  factor  used  for  discounting  rewards  is  f  =  0.05  ,  which  means  that  a 
target  intercepted  at  the  end  of  the  12  hours  time  horizon  has  approximately  Xo  of  the 
operational  value  of  a  target  intercepted  at  t  =  0 . 

The  value  of  the  probability  threshold  M ,  which  the  Recognizer  uses  to  flag  a 
tracked  object  as  either  a  likely  target  or  a  likely  neutral,  is  systematically  varied  to 
examine  its  effects  on  the  results. 

Using  a  MacBook  Pro  with  Dual-Core  2.53GHz  CPU  and  4GB  of  RAM, 
computing  the  expected  reward  of  the  Upper  Bound  Problem  with  this  scenario  has  a  run 
time  of  approximately  30  minutes,  while  computing  the  expected  reward  following  the 
heuristic  decision  policy  with  this  scenario  has  a  run  time  of  approximately  6  minutes. 

2.  Additional  Scenarios 

To  better  evaluate  our  suggested  heuristic’s  performance  and  to  gain  a  deeper 
insight  into  the  nature  of  this  problem,  we  perform  a  brief  parametrical  study  and 
sensitivity  analysis  on  several  key  parameters  in  the  baseline  scenario.  This  is 
accomplished  by  running  and  analyzing  several  other  scenarios  based  on  the  baseline 
scenario.  These  scenarios  include  a  zero-discounting  scenario,  a  scenario  with  poor 
sensor  capabilities,  a  scenario  with  extended  boarding  time  for  the  Interceptor,  a  48  hours 
time  horizon  scenario  and  an  1600nm  8-by-8  AOI  scenario. 

C.  RESULTS  AND  INSIGHTS 

The  following  sections  present  the  results  of  the  numerical  case  studies  with 
respect  to  the  different  scenarios  examined,  and  discuss  some  insights  derived  from  these 
results.  The  first  section  discusses  the  main  question  the  numerical  case  study  is  intended 
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to  answer:  How  well  does  the  suggested  heuristie  perform?  The  subsequent  seetions 
diseuss  additional  insights  and  interesting  results  regarding  the  MIM  operational 
seenario. 

1.  Performance  of  the  Heuristic  Decision  Policy 

The  key  motivation  for  this  numerieal  ease  study  is  to  evaluate  the  performanee  of 
our  suggested  heuristie  in  the  eontext  of  some  representative  seenarios.  The  main  result  is 
the  gap  between  the  expected  reward  (i.e.,  discounted  expected  number  of  intercepted 
targets)  achieved  by  the  heuristic  decision  policy  and  that  achieved  by  the  Upper  Bound 
Problem  optimal  decision  policy.  Note  that  this  gap  is  an  upper  bound  on  the  true  gap 
between  the  performance  of  the  heuristic  decision  policy  and  the  Original  Problem 
optimal  decision  policy,  and  so  the  gap  presented  here  is  a  “worst  case  scenario”.  Table  5 
and  Figure  1 1  present  the  results  of  both  the  heuristic  and  the  Upper  Bound  Problem 
decision  policies  in  the  baseline  scenario,  using  several  different  values  as  the  probability 
threshold  for  deciding  whether  or  not  to  call  the  Interceptor  at  the  end  of  the  tracking 
phase  of  a  detected  object.  The  error  bars  in  Figure  11  (and  later  in  Figures  12-14) 
represent  the  confidence  intervals  of  the  estimated  expected  reward  following  the 
heuristic  policy. 

In  this  baseline  scenario,  the  gap  is  about  30%  on  average  for  different  values  of 
the  probability  threshold  M ,  with  relatively  little  sensitivity  to  the  choice  of  M .  This 
means  that  using  the  heuristic  decision  policy  results  with  at  least  -70%  of  the  Original 
Problem  expected  reward. 

Additional  scenarios,  presented  in  the  subsequent  sections,  support  the  statement 
that  the  heuristic  is  useful  in  obtaining  a  simple  and  effective  decision  policy  for  the  MIM 
operational  scenario.  By  choosing  the  appropriate  value  for  the  probability  threshold  M  , 
we  get  that  all  examined  scenarios  resulted  in  a  gap  of  less  than  40%  between  the 
heuristic  expected  reward  and  the  optimal  expected  reward. 
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Probabiiity 
threshold  M 

Upper  Bound 
expected  reward 

Heuristic 
expected  reward 

%gap 

0 

0.72 

0.50 

30.9 

0.01 

0.75 

0.52 

30.5 

0.05 

0.76 

0.54 

29.7 

0.1 

0.77 

0.52 

32.2 

0.15 

0.77 

0.53 

30.8 

0.25 

0.74 

0.51 

31.0 

0.35 

0.74 

0.51 

31.4 

0.5 

0.68 

0.47 

30.8 

0.75 

0.63 

0.41 

34.3 

0.9 

0.44 

0.30 

32.2 

Tables.  Baseline  scenario  results 
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Baseline  scenario 

{gamm3=0.05,  V_R=80kts,  V_l=20kts,  T=12hrs,  u=v=0.8,  g=3) 


0.2  0.4  0.6  0.8 

Probability  threshoid  M 

♦-Upper  Bound  -B-Heuristic 


Figure  1 1 .  Baseline  scenario  results 
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2. 


No  Discounting  of  Rewards 


This  scenario  is  identical  to  the  baseline  seenario  exeept  we  did  not  use  any 
diseounting  (  y  =  0  ). 


Probability 

Upper  Bound 

Heuristic 

%gap 

threshold  M 

expected  reward 

expected  reward 

0 

1.92 

1.21 

37.1 

0.01 

2.02 

1.22 

39.4 

0.05 

2.10 

1.28 

39.1 

0.1 

2.10 

1.27 

39.2 

0.15 

2.10 

1.23 

41.1 

0.25 

2.03 

1.20 

40.6 

0.35 

2.03 

1.23 

39.4 

0.5 

1.81 

1.14 

37.0 

0.75 

1.72 

0.90 

47.8 

Table  6.  No-diseounting  seenario  results 


Figure  12.  No-diseounting  scenario  results 


49 


In  this  no-discounting  scenario,  the  gap  is  slightly  larger  than  in  the  baseline 
seenario,  with  about  40%  gap  on  average  for  different  values  of  the  probability 
threshold  M  .  This  means  that  using  the  heuristie  deeision  poliey  (in  this  no-discounting 
scenario)  results  in  expeeted  reward  that  is  at  least  -60%  of  that  in  the  Original  Problem. 
The  shape  of  the  graph  in  Figure  12  is  very  similar  to  the  baseline  scenario  result,  with 
relatively  little  sensitivity  to  the  ehoiee  of  M  .  A  possible  explanation  to  the  slightly 
better  results  when  using  diseounting  over  no-diseounting  is  the  “greedy”  nature  of  the 
heuristie.  A  myopie  approaeh,  as  used  in  this  heuristie,  makes  more  operational  sense 
when  there  is  higher  operational  value  for  intereepting  targets  in  the  near  future  than  for 
targets  intereepted  later  in  the  future.  Nevertheless,  even  without  any  diseounting  at  all, 
the  myopic  heuristic  is  useful,  with  only  -10%  worse  performanee  than  with  a 
diseounting  faetor  of  y  =  0.05  as  used  in  the  baseline  seenario. 

3,  Low  Quality  Signature  Recognition 

In  this  scenario,  we  assumed  the  Recognizer  sensor  has  poor  signature 
recognition,  meaning  they  can  hardly  tell  between  a  neutral  and  a  target  based  on  the 
signature  of  the  tracked  object.  The  false  positive  and  false  negative  detection 
probabilities  of  a  target  were  assumed  to  both  be  0.4  with  u=v  =  0.6  (Note  that 
M  =  V  =  0.5  is  a  useless  sensor  with  no  ability  to  identify  the  type  of  a  tracked  object). 
This  scenario  was  run  without  discounting  the  intercepted  targets  (7  =  0).  As  expected, 
the  overall  results  (Table  7  and  Figure  13)  are  worse  than  the  corresponding  scenario  with 
better  sensor  quality. 

We  can  easily  notice  the  effect  of  the  poor  sensor  quality  on  the  results  of  this  run: 
Except  when  choosing  to  intercept  every  detected  target  (M  =  0),  as  we  choose  higher 
values  for  the  probability  threshold  M  we  get  worse  and  worse  results,  meaning  less  and 
less  targets  intercepted.  This  can  be  explained  by  the  fact  that  with  such  poor  quality 
signature  recognition  sensor,  situations  in  which  a  tracked  object  has  a  probability  of 
being  a  target  of  0.2  and  above  is  rare  (remember  that  the  probability  of  an  object  being  a 
neutral  prior  to  any  sensing  is  higher  than  the  probability  of  that  object  being  a  target,  that 


50 


is  because  of  the  arrival  probabilities  of  neutrals  and  targets  to  the  AOI).  This  results  in 
the  fact  that  any  threshold  of  0.2  and  above  is  rarely  met. 

This  situation  of  using  poor  sensors  is  obviously  unwanted,  but  if  there  is  no  way 
to  avoid  it,  a  very  low  probability  threshold  is  best.  Using  a  high  probability  threshold  in 
this  scenario  does  not  make  any  sense,  and  so  we  ignore  the  poor  results  of  the  heuristic 
in  these  cases  (as  bad  as  -94%  gap)  when  evaluating  the  overall  heuristic  performance. 


Probability 
threshold  M 

Upper  Bound 
expected  reward 

Heuristic 
expected  reward 

%gap 

0 

1.92 

1.23 

36.2 

0.01 

1.92 

1.20 

37.5 

0.05 

1.92 

1.22 

36.3 

0.1 

1.92 

1.19 

38.4 

0.15 

1.83 

1.15 

37.3 

0.25 

1.62 

1.01 

37.5 

0.35 

1.51 

0.79 

47.9 

0.5 

0.99 

0.55 

45.0 

0.75 

0.28 

0.02 

93.4 

Table  7.  Poor  signature  recognition  scenario  results 
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Figure  13.  Poor  signature  recognition  scenario  results 
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4. 


How  Often  Should  We  Call  the  Interceptor? 


Inspecting  the  results  of  the  baseline  scenario  (Table  5  and  Figure  11),  we  notice 
that  the  “best”  probability  threshold  in  the  heuristic  context  is  M  =  0.05  .  This  threshold 
value  results  in  an  expected  number  of  0.76  intercepted  targets  in  the  Upper  Bound 
Problem,  which  is  practically  the  same  as  the  optimal  0.77  targets.  A  threshold  of 
M  =  0.05  also  results  in  the  lowest  observed  gap  of  29.7%.  All  of  the  above  can  point  to 
the  fact  that  it  is  good  to  choose  a  value  of  M  =  0.05  when  creating  the  CONOPS  of  an 
interdiction  force  under  the  assumptions  of  the  baseline  scenario.  This  result  appears  to 
be  counter  intuitive,  as  it  appears  intuitive  to  choose  a  higher  threshold.  It  seems  to  make 
sense  to  choose  a  threshold  of  at  least  0.5,  meaning  that  we  should  call  the  Interceptor 
(and  “waste”  the  time  associated  with  the  interception)  only  when  it  is  at  least  more  likely 
that  a  tracked  object  is  a  target  than  it  is  a  neutral.  The  first  guess  of  an  effective 
threshold  value,  when  this  scenario  was  first  implanted  and  tested,  was  in  fact 
M  =  0.8  (the  motivation  was  to  set  the  threshold  high  enough  as  to  only  call  the 
Interceptor  when  it  is  most  likely  that  a  target  will  be  eventually  intercepted).  The  fact 
that  such  high  probability  threshold  values  are  worse  than  much  lower  values  as 
M  =  0.05  was  initially  surprising.  To  better  understand  these  counterintuitive  results,  we 
evaluated  several  scenarios  with  longer  interception  times.  In  the  baseline  scenario,  the 
time  the  interdiction  force  “pays”  for  calling  in  the  interceptor  over  letting  the  tracked 
object  go  without  intercepting  it,  is  only  the  travel  time  of  the  Interceptor  from  its  current 
location  to  the  interception  location.  While  speculating  that  this  “time -penalty”  for 
risking  an  interception  is  too  low,  we  constructed  and  analyzed  two  new  scenarios  in 
which  we  artificially  extended  the  time  of  interception  by  introducing  a  boarding  time  for 
the  interception  of  objects.  This  means  that  every  time  the  Interceptor  is  called  in  to 
intercept  a  likely  target,  the  time  it  takes  until  the  interception  is  complete  is  the  sum  of 
the  travel  time  of  the  Interceptor  to  the  interception  location  and  the  boarding  time.  A 
longer  boarding  time  discourages  calling  the  Interceptor  “too  often.”  Table  8  and  Figure 
14  compare  the  results  of  these  three  scenarios:  the  baseline  scenario  with  0  boarding 
time,  a  scenario  with  5  time  steps  boarding  time,  and  a  scenario  with  20  time  steps 
boarding  time. 
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Probability 
threshold  M 

Boarding  time  =  0  time  steps 

Boarding  time  =  5  time  steps 

Boarding  time  -  20  time 

Upper 

Bound 

expected 

reward 

Heuristic 

expected 

reward 

%gap 

Upper 

Bound 

expected 

reward 

Heuristic 

expected 

reward 

%gap 

Upper 

Bound 

expected 

reward 

Heuristic 

expected 

reward 

%gap 

0 

0.72 

0.50 

30.9 

0.35 

0.29 

17.3 

0.09 

0.09 

3.5 

0.01 

0.75 

0.52 

30.5 

0.39 

X 

X 

0.10 

X 

X 

0.05 

0.76 

0.54 

29.7 

0.42 

0.34 

19.7 

0.12 

0.11 

12.1 

0.1 

0.77 

0.52 

32.2 

0.45 

X 

X 

X 

X 

X 

0.15 

0.77 

0.53 

30.8 

0.45 

0.36 

21.1 

X 

0.12 

X 

0.25 

0.74 

0.51 

31.0 

0.46 

X 

X 

0.14 

X 

X 

0.35 

0.74 

0.51 

31.4 

0.46 

0.35 

23.3 

0.14 

0.13 

10.3 

0.5 

0.68 

0.47 

30.8 

0.43 

0.33 

23.2 

0.14 

0.12 

13.8 

0.75 

0.63 

0.41 

34.3 

0.41 

0.30 

25.5 

X 

X 

X 

0.9 

0.44 

0.30 

32.2 

0.30 

0.22 

26.6 

0.11 

0.09 

18.5 

Table  8.  Sensitivity  to  boarding  time  (x  marks  seenarios  which  have  not  been 

calculated) 


The  main  motivation  for  the  analysis  of  these  scenarios  is  to  examine  the  shape  of 
the  graphs,  and  so  not  all  data  points  are  computed  for  both  scenarios. 


Sensitivity  to  boarding  time 
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Figure  14.  Sensitivity  to  boarding  time 
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The  results  of  the  seenario  with  five  time  steps  boarding  time  are  indeed  more 
intuitive  than  the  baseline  seenario  with  0  boarding  time.  Note  the  shape  of  the  graphs  in 
the  two  new  scenarios,  for  both  the  Upper  Bound  expected  reward  and  the  heuristic 
expected  reward,  which  suggests  it  is  best  to  choose  higher  values  for  the  probability 
threshold  parameter  M  than  in  the  baseline  scenario.  A  threshold  value  of  approximately 
0.2  seems  to  have  the  best  results  in  the  scenario  with  boarding  times  of  five  time  steps, 
while  a  value  of  approximately  0.4  seems  to  be  the  best  in  the  scenario  with  boarding 
times  of  20  time  steps.  These  results  make  sense  since  with  longer  overall  interception 
time  it  is  not  that  efficient  to  call  the  Interceptor  too  often. 

5,  Extended  Time  Horizon 

All  the  scenarios  presented  so  far  had  a  time  horizon  of  12  hours  (7  =  48  time 
steps).  To  confirm  the  heuristic’s  performance  and  insights  presented  earlier  in  this 
chapter,  we  examine  a  scenario  with  24  hours  time  horizon  {T  =  96  time  steps,  with  all 
other  parameters  as  in  the  baseline  scenario).  The  heuristic’s  expected  reward  in  this 
scenario  is  0.57  and  the  Upper  Bound  expected  reward  is  0.85,  with  a  gap  of  33%.  The 
heuristic  performance  in  this  scenario  is  very  close  to  the  observed  performance  in  the 
baseline  scenario  with  half  the  length  of  the  time  horizon.  This  encouraging  result 
supports  our  statement  that  difference  between  the  heuristic  and  optimal  expected 
rewards  is  less  than  -30%. 

6.  8-by-8  AOI 

Another  scenario  examined  for  confirming  our  results  and  insights  is  a  scenario 
with  a  larger  AOI;  a  I600nm^  AOI  (8-by-8  ACs)  instead  of  the  625nm^  AOIs  (5-by-5 
ACs)  used  in  all  previous  scenarios.  The  number  of  ACs  in  this  enlarged  AOI  is 
H  «  2.5  times  the  number  of  ACs  in  the  baseline  scenario.  Recall  that  the  run  time  of  the 

Upper  Bound  Problem  backward  calculation  is  o{t  -1^1^  •(3-//  +  2)j,  which  means  that 

the  running  time  for  the  Upper  Bound  calculation  of  this  8-by-8  scenario  is 
approximately  16  times  longer  than  the  5-by-5  scenario  run  time.  As  the  baseline  scenario 
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Upper  Bound  calculation  takes  approximately  30  minutes,  this  8-by-8  scenario  indeed 
takes  approximately  8  hours.  For  this  reason,  we  only  examined  a  single  run  of  this  8-by- 
8  scenario.  The  rest  of  the  scenario  parameters  are  as  in  the  baseline  scenario. 

The  heuristic  expected  reward  in  this  scenario  is  found  to  be  0.45,  while  the 
Upper  Bound  expected  reward  is  0.62,  with  a  gap  of  28%.  The  result  of  this  scenario  is  in 
agreement  with  previously  presented  results  and,  therefore,  supports  our  statements 
regarding  the  heuristic  performance. 
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V.  CONCLUSIONS  AND  RECOMMENDATIONS 


A,  CONCLUSIONS 

The  goal  of  this  thesis  is  to  develop  a  useful  taetieal  deeision  aid  for  an 
interdietion  foree  in  a  MIM  seenario.  We  diseuss  the  reasons  why  finding  the  exaet 
optimal  deeision  poliey  in  this  seenario  is  an  intraetable  problem,  and  so  we  suggest  a 
heuristie  sub-optimal  deeision  poliey  instead  of  the  optimal  one. 

Based  on  the  analysis  of  several  numerieal  ease  studies,  we  eonelude  that  the 
number  of  intereepted  targets  following  the  heuristie  deeision  poliey  is  at  least  60%  of  the 
number  of  targets  intereepted  following  the  optimal  deeision  poliey.  This  pereentage 
improves  to  approximately  70%  when  diseounting  intereepted  targets  with  a  diseount 
faetor  of  0.05  with  respeet  to  time  steps  of  15  minutes. 

Based  on  the  observed  heuristie  performanee  in  the  numerieal  ease  study,  we 
reeommend  the  use  of  this  heuristie  in  any  MIM  seenario  whieh  elosely  resembles  the 
MIM  scenario  discussed  in  this  thesis.  While  40%  is  indisputably  a  significant  gap,  the 
heuristic  decision  policy  calculation  can  be  completed  almost  instantaneously  while  the 
true  optimal  decision  policy  is  intractable  to  compute  in  any  plausible  operational 
situation. 

Furthermore,  we  gained  additional  insights  regarding  the  effect  that  several 
operational  and  technical  parameters  of  the  interdiction  force  have  on  the  expected 
outcome  of  the  MIM.  Such  insights  include  how  to  better  choose  the  probability 
threshold  for  intercepting  a  tracked  object,  and  what  performance  should  we  expect  from 
our  heuristic  and  from  any  other  feasible  decision  policy  (including  the  optimal  one) 
under  different  scenarios. 

This  thesis  is  a  first  attempt  to  obtain  an  optimal  operational  policy  for  a 
synchronized  sensor-interceptor  maritime  interdiction  force  or  to  quantitatively  analyze  a 
heuristic  approach  to  this  problem.  The  models,  methodology  and  even  the 
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implementation  eode  developed  in  this  thesis,  can  be  easily  applied  to  future  research  of 
this  problem  or  similar  ones,  as  briefly  discussed  in  the  next  section. 


B,  SUGGESTED  WORK  AHEAD 

There  are  several  interesting  research  avenues  to  be  extended  from  this  thesis. 
New  heuristic  decision  policies  can  be  developed  and  evaluated  with  minimal 
modifications  to  the  heuristic  decision  policy  suggested  in  this  thesis. 

Improved  implementation  of  the  models  presented  in  this  thesis  with  faster  run 
time  and  more  efficient  memory  use,  can  enable  the  analysis  of  bigger  and  more  realistic 
operational  scenarios  that  will  reinforce  our  confidence  in  the  heuristic  performance 
presented  in  this  research. 

Extensions  of  the  scenarios  and  models  in  this  thesis  can  include  the  introduction 
of  multiple  Recognizers  and/or  Interceptors,  the  optimization  of  the  Interceptor  location 
throughout  the  scenario  (instead  of  waiting  at  its  current  location  until  called  for  by  the 
Recognizer),  or  a  more  realistic  interception  model  for  the  phase  between  the  flagging  of 
a  tracked  object  as  a  likely  target  and  the  actual  interception. 

Another  extension  of  this  thesis  can  be  to  introduce  “Red-team  intelligence”, 
meaning  that  we  allow  the  targets  to  act  strategically  react  to  the  interdiction  force’s 
actions.  Such  possible  reactions  must  be  taken  into  account  when  optimizing  the 
operational  policy  of  the  interdiction  force,  using  game-theoretic  approaches. 

An  alternative  approach  to  the  numerical  analysis  of  any  suggested  heuristic  is  to 
attempt  to  analytically  prove  an  approximated  sub-optimal  decision  policy. 
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APPENDIX.  MATLAB  IMPLEMENTATION  OVERVIEW 


The  implementation  of  all  models  and  algorithms  in  this  thesis  was  coded  in 
MATAB  7  (R14).  All  runs  were  done  on  a  MacBook  Pro  with  an  Intel  Dual-Core 
2.53GHz  CPU  with  4GB  RAM,  on  Windows  XP  Professional. 

The  MATLAB  code  (Table  9)  implements  the  heuristic  expected  value 
calculation  {H)  and  the  Upper  Bound  Problem  expected  value  calculation  {U).  The  code 
consists  of  two  main  run  files  and  13  supporting  functions  (some  of  which  are  used  in 
both  main  run  files).  All  together,  there  are  approximately  2000  lines  of  code. 


Filename 

Description 

H 

u 

Heuristic  calc.m 

Heuristic  expected  value  calculation  (main  run  fde) 

X 

Upper  Bound  calc.m 

Upper  Bound  Problem  expected  value  calculation  (main  run  fde) 

X 

create  AOI.m 

Creates  the  AOI:  Markov  transition  matrices,  arrival  probabilities  and 

ACs  trace  (translation  between  ACs  index  and  coordinates) 

X 

X 

travel  times. m 

Calculates  all  ACs  pairs  travel  times  for  both  Recognizer  and  Interceptor 

X 

X 

steady  state. m 

Calculates  the  targets  and  neutrals  steady-state  probability  vectors 

X 

X 

X  heuristic. m 

Calculates  the  current  decision  following  the  heuristic  policy 

X 

w  realization.m 

Get  a  realization  of  the  information  according  to  the  appropriate  PMF 

X 

reward,  m 

Calculates  the  obtained  reward 

X 

sM.m 

Calculates  the  new  state  after  decision  is  fathomed 

X 

pi  hat.m 

Targets  probability  vector  update  function 

X 

pi  hat  a.m 

Single  AC  target  probability  update  function 

X 

X 

theta  hat.m 

Neutrals  probability  vector  update  function 

X 

theta  hat  a.m 

Single  AC  neutral  probability  update  function 

X 

X 

theta  rec.m 

Calculates  the  probability  for  a  target  after  the  tracking  phase 

X 

X 

plot_prob_map.m 

Plots  the  probability  vectors  as  probability  color  maps 

X 

X 

Table  9.  MATLAB  code  filenames  and  descriptions 
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